大图:与平行的t-SNE进行大数据绘图 (bigMap: Big Data Mapping with Parallelized t-SNE) - 专知论文

会员服务 ·

0

估计/估计量 · 簇 · 降维 · 核密度估计 · 可辨认的 ·

2019 年 11 月 4 日

bigMap: Big Data Mapping with Parallelized t-SNE

翻译：大图:与平行的t-SNE进行大数据绘图

Joan Garriga,Frederic Bartumeus

from arxiv, 24 pages main text including 6 (full-page) figures; bigMap R-pacakge available at CRAN

We introduce an improved unsupervised clustering protocol specially suited for large-scale structured data. The protocol follows three steps: a dimensionality reduction of the data, a density estimation over the low dimensional representation of the data, and a final segmentation of the density landscape. For the dimensionality reduction step we introduce a parallelized implementation of the well-known t-Stochastic Neighbouring Embedding (t-SNE) algorithm that significantly alleviates some inherent limitations, while improving its suitability for large datasets. We also introduce a new adaptive Kernel Density Estimation particularly coupled with the t-SNE framework in order to get accurate density estimates out of the embedded data, and a variant of the rainfalling watershed algorithm to identify clusters within the density landscape. The whole mapping protocol is wrapped in the bigMap R package, together with visualization and analysis tools to ease the qualitative and quantitative assessment of the clustering.

翻译：我们引入了专门适合大型结构化数据的改良的未经监督的集群协议。协议遵循了三个步骤:数据的维度减少,对数据低维度表示的密度估计,以及密度景观的最后分割。对于维度减少步骤,我们引入了对众所周知的T-Schacistic相邻嵌入(t-SNE)算法的平行实施,该算法大大缓解了某些内在限制,同时改善了其对大型数据集的适合性。我们还引入了一种新的适应性内核密度估计法,特别是T-SNE框架,以便从嵌入的数据中获取准确的密度估计,以及降雨流域算法的变式,以识别密度景观内的集群。整个测绘协议包包包在大Map R 包中,同时结合了可视化和分析工具,以方便对集群进行定性和定量评估。

1

相关内容

估计/估计量

估计/估计量

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

Python数据分析:过去、现在和未来，52页ppt

Python数据分析:过去、现在和未来，52页ppt

专知会员服务

99+阅读 · 2020年3月9日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

75+阅读 · 2020年2月8日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

57+阅读 · 2019年12月21日

【报告推荐】三维及超形体分析中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Shape Analysis）

【报告推荐】三维及超形体分析中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Shape Analysis）

专知会员服务

20+阅读 · 2019年11月10日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

234+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

90+阅读 · 2019年10月16日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

42+阅读 · 2019年6月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

15+阅读 · 2018年12月24日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

13+阅读 · 2019年8月8日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Mobile big data analysis with machine learning

Mobile big data analysis with machine learning

Arxiv

6+阅读 · 2018年8月2日

Learned in Translation: Contextualized Word Vectors

Arxiv

6+阅读 · 2018年6月20日

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Arxiv

6+阅读 · 2018年5月29日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

4+阅读 · 2018年1月29日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

估计/估计量

核密度估计

相关VIP内容

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

Python数据分析:过去、现在和未来，52页ppt

Python数据分析:过去、现在和未来，52页ppt

专知会员服务

99+阅读 · 2020年3月9日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

75+阅读 · 2020年2月8日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

57+阅读 · 2019年12月21日

【报告推荐】三维及超形体分析中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Shape Analysis）

【报告推荐】三维及超形体分析中的几何与数据学习（Geometry and Learning from Data in 3D and Beyond - Shape Analysis）

专知会员服务

20+阅读 · 2019年11月10日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

234+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

90+阅读 · 2019年10月16日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

42+阅读 · 2019年6月1日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

15+阅读 · 2018年12月24日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

13+阅读 · 2019年8月8日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Mobile big data analysis with machine learning

Mobile big data analysis with machine learning

Arxiv

6+阅读 · 2018年8月2日

Learned in Translation: Contextualized Word Vectors

Arxiv

6+阅读 · 2018年6月20日

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Arxiv

6+阅读 · 2018年5月29日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

SEARNN: Training RNNs with Global-Local Losses

Arxiv

4+阅读 · 2018年1月29日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员