瓦森斯坦t-SNE (Wasserstein t-SNE) - 专知论文

会员服务 ·

0

Analysis · 成对型 · 样本 · 数据集 · 确切的 ·

2022 年 6 月 23 日

Wasserstein t-SNE

翻译：瓦森斯坦t-SNE

Fynn Bachmann,Philipp Hennig,Dmitry Kobak

from arxiv, 16 pages, 10 figures, to be published at ECML/PKDD 2022

Scientific datasets often have hierarchical structure: for example, in surveys, individual participants (samples) might be grouped at a higher level (units) such as their geographical region. In these settings, the interest is often in exploring the structure on the unit level rather than on the sample level. Units can be compared based on the distance between their means, however this ignores the within-unit distribution of samples. Here we develop an approach for exploratory analysis of hierarchical datasets using the Wasserstein distance metric that takes into account the shapes of within-unit distributions. We use t-SNE to construct 2D embeddings of the units, based on the matrix of pairwise Wasserstein distances between them. The distance matrix can be efficiently computed by approximating each unit with a Gaussian distribution, but we also provide a scalable method to compute exact Wasserstein distances. We use synthetic data to demonstrate the effectiveness of our Wasserstein t-SNE, and apply it to data from the 2017 German parliamentary election, considering polling stations as samples and voting districts as units. The resulting embedding uncovers meaningful structure in the data.

翻译：科学数据集往往具有等级结构:例如,在调查中,个别参与者(样本)可以按其地理区域等较高层次(单位)分组,在这些环境中,感兴趣的往往是在单位一级而不是抽样一级探索结构。单位可以根据其手段之间的距离进行比较,但这忽略了样本在单位内的分布。我们在这里开发了一种方法,利用瓦瑟斯坦距离标准对等级数据集进行探索性分析,该标准考虑到单位内分布的形状。我们使用t-SNE来根据它们之间的对称瓦瑟斯坦距离矩阵来建造2D单元嵌入。通过对称瓦瑟斯坦距离的矩阵来有效计算距离,但以高斯分布相近的方式计算,我们还提供了一种可缩放的方法来计算准确的瓦瑟斯坦距离。我们使用合成数据来证明我们的瓦瑟斯坦t-SNE的功效,并将这些数据应用于2017年德国议会选举的数据,将投票站视为样品和投票区作为单位。结果揭示了数据中有意义的结构。

0

相关内容

Analysis

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

47+阅读 · 2021年1月20日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

52+阅读 · 2020年11月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

19+阅读 · 2018年4月7日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

功能选择性beta2肾上腺素受体激动剂的发现

国家自然科学基金

0+阅读 · 2016年12月31日

基于植被指数斜率的地表覆盖变化检测方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

有序多孔贵金属介观晶体的制备科学及其SERS性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属纳米结构SERS效应在食品安全检测中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于偏振与颜色信息的视觉测量图像融合、评价及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于鞅理论与统计信息的仿真优化

国家自然科学基金

0+阅读 · 2012年12月31日

光波导耦合金属等离子体共振结构用于表面增强拉曼散射光谱的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

约束优化问题的目标罚函数的精确性和算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于节律运动模式的仿生扑翼水下推进机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Hypergraph reconstruction from noisy pairwise observations

Arxiv

0+阅读 · 2022年8月12日

Function Classes for Identifiable Nonlinear Independent Component Analysis

Function Classes for Identifiable Nonlinear Independent Component Analysis

Arxiv

0+阅读 · 2022年8月12日

Wasserstein Complexity of Quantum Circuits

Arxiv

0+阅读 · 2022年8月12日

A Gumbel-based Rating Prediction Framework for Imbalanced Recommendation

Arxiv

0+阅读 · 2022年8月11日

GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction

Arxiv

0+阅读 · 2022年8月10日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Compositional GAN: Learning Conditional Image Composition

Compositional GAN: Learning Conditional Image Composition

Arxiv

31+阅读 · 2018年7月19日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

47+阅读 · 2021年1月20日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

52+阅读 · 2020年11月3日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

热门VIP内容

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

19+阅读 · 2018年4月7日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Hypergraph reconstruction from noisy pairwise observations

Arxiv

0+阅读 · 2022年8月12日

Function Classes for Identifiable Nonlinear Independent Component Analysis

Function Classes for Identifiable Nonlinear Independent Component Analysis

Arxiv

0+阅读 · 2022年8月12日

Wasserstein Complexity of Quantum Circuits

Arxiv

0+阅读 · 2022年8月12日

A Gumbel-based Rating Prediction Framework for Imbalanced Recommendation

Arxiv

0+阅读 · 2022年8月11日

GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction

Arxiv

0+阅读 · 2022年8月10日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

A Survey of Deep Learning for Scientific Discovery

A Survey of Deep Learning for Scientific Discovery

Arxiv

29+阅读 · 2020年3月26日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Compositional GAN: Learning Conditional Image Composition

Compositional GAN: Learning Conditional Image Composition

Arxiv

31+阅读 · 2018年7月19日

相关基金

功能选择性beta2肾上腺素受体激动剂的发现

国家自然科学基金

0+阅读 · 2016年12月31日

基于植被指数斜率的地表覆盖变化检测方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

有序多孔贵金属介观晶体的制备科学及其SERS性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属纳米结构SERS效应在食品安全检测中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于偏振与颜色信息的视觉测量图像融合、评价及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于鞅理论与统计信息的仿真优化

国家自然科学基金

0+阅读 · 2012年12月31日

光波导耦合金属等离子体共振结构用于表面增强拉曼散射光谱的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

约束优化问题的目标罚函数的精确性和算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于节律运动模式的仿生扑翼水下推进机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员