基于方言的嵌入空间的变异性和不稳定性 (Variation and Instability in Dialect-Based Embedding Spaces) - 专知论文

会员服务 ·

0

嵌入空间 · 变异性 · 嵌入 · 不稳定 · 均匀分布 ·

2023 年 3 月 27 日

Variation and Instability in Dialect-Based Embedding Spaces

翻译：基于方言的嵌入空间的变异性和不稳定性

This paper measures variation in embedding spaces which have been trained on different regional varieties of English while controlling for instability in the embeddings. While previous work has shown that it is possible to distinguish between similar varieties of a language, this paper experiments with two follow-up questions: First, does the variety represented in the training data systematically influence the resulting embedding space after training? This paper shows that differences in embeddings across varieties are significantly higher than baseline instability. Second, is such dialect-based variation spread equally throughout the lexicon? This paper shows that specific parts of the lexicon are particularly subject to variation. Taken together, these experiments confirm that embedding spaces are significantly influenced by the dialect represented in the training data. This finding implies that there is semantic variation across dialects, in addition to previously-studied lexical and syntactic variation.

翻译：本文控制嵌入的不稳定性，测量了在训练不同英语地区方言时的嵌入空间的变异性。先前的研究表明，可以区分语言的类似方言，该论文将进行两个后续问题的实验：第一，训练数据中所代表的方言是否会对训练后得到的嵌入空间产生系统影响？该文表明，在不同方言之间的嵌入差异比基线的不稳定性要高得多。第二，这种基于方言的变异是否在整个词汇中均匀分布？该文表明，词汇的特定部分特别容易受到变异的影响。这些实验的结果表明，方言所代表的嵌入空间受到了显著影响。这一发现意味着，除了之前研究的词汇和句法变异外，还存在语义上的方言变异。

0

相关内容

嵌入空间

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

20+阅读 · 2022年3月18日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

29+阅读 · 2020年3月28日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

31+阅读 · 2020年2月1日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

43+阅读 · 2019年11月24日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

27+阅读 · 2019年11月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

3维Lorentz空间中的伪圆纹Willmore曲面与4维球面中的共形曲面论

国家自然科学基金

0+阅读 · 2014年12月31日

大气压感应耦合等离子体在聚酰亚胺上制备铜薄膜机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

飞秒矢量光场的超衍射极限微结构制备及特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST上离子回旋模式转换驱动等离子体转动的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

统计学习中文问句分类方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

受激超冷气体的奇特量子输运效应

国家自然科学基金

0+阅读 · 2009年12月31日

Blazars和微类星体的多波段观测与研究

国家自然科学基金

0+阅读 · 2008年12月31日

M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

Arxiv

0+阅读 · 2023年5月17日

Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations

Arxiv

0+阅读 · 2023年5月16日

Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation

Arxiv

0+阅读 · 2023年5月15日

Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

Arxiv

0+阅读 · 2023年5月15日

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月12日

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Arxiv

0+阅读 · 2023年5月12日

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Arxiv

15+阅读 · 2022年11月29日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

A survey of embedding models of entities and relationships for knowledge graph completion

Arxiv

23+阅读 · 2020年8月10日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

VIP会员

文章信息

相关主题

相关VIP内容

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

20+阅读 · 2022年3月18日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

29+阅读 · 2020年3月28日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

31+阅读 · 2020年2月1日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

43+阅读 · 2019年11月24日

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

【AAAI2020论文-清华大学】基于人物稀疏数据的预训练个性化对话生成模型（A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data）

专知会员服务

27+阅读 · 2019年11月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

Arxiv

0+阅读 · 2023年5月17日

Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations

Arxiv

0+阅读 · 2023年5月16日

Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation

Arxiv

0+阅读 · 2023年5月15日

Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

Arxiv

0+阅读 · 2023年5月15日

Knowledge Refinement via Interaction Between Search Engines and Large Language Models

Arxiv

0+阅读 · 2023年5月12日

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Arxiv

0+阅读 · 2023年5月12日

Lifelong Embedding Learning and Transfer for Growing Knowledge Graphs

Arxiv

15+阅读 · 2022年11月29日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

A survey of embedding models of entities and relationships for knowledge graph completion

Arxiv

23+阅读 · 2020年8月10日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

相关基金

3维Lorentz空间中的伪圆纹Willmore曲面与4维球面中的共形曲面论

国家自然科学基金

0+阅读 · 2014年12月31日

大气压感应耦合等离子体在聚酰亚胺上制备铜薄膜机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

飞秒矢量光场的超衍射极限微结构制备及特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST上离子回旋模式转换驱动等离子体转动的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

统计学习中文问句分类方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

受激超冷气体的奇特量子输运效应

国家自然科学基金

0+阅读 · 2009年12月31日

Blazars和微类星体的多波段观测与研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员