对关于有病理丰富语言专题模式的柠檬化分析 (An Analysis of Lemmatization on Topic Models of Morphologically Rich Language) - 专知论文

会员服务 ·

0

话题模型 · MoDELS · 话题 · 相似度 · 维基百科 ·

2019 年 5 月 10 日

An Analysis of Lemmatization on Topic Models of Morphologically Rich Language

翻译：对关于有病理丰富语言专题模式的柠檬化分析

Chandler May,Ryan Cotterell,Benjamin Van Durme

Topic models are typically represented by top-$m$ word lists for human interpretation. The corpus is often pre-processed with lemmatization (or stemming) so that those representations are not undermined by a proliferation of words with similar meanings, but there is little public work on the effects of that pre-processing. Recent work studied the effect of stemming on topic models of English texts and found no supporting evidence for the practice. We study the effect of lemmatization on topic models of Russian Wikipedia articles, finding in one configuration that it significantly improves interpretability according to a word intrusion metric. We conclude that lemmatization may benefit topic models on morphologically rich languages, but that further investigation is needed.

翻译：专题模型通常由用于人文解释的顶值-百万美元单词列表代表,本套通常先用浸渍(或冲压)处理,这样,这些表述不会因具有类似含义的词语的激增而受到损害,但几乎没有关于预处理影响的公共工作。最近的工作研究了对英文文本专题模型的影响,没有发现支持这一做法的证据。我们研究了浸渍对俄罗斯维基百科文章专题模型的影响,在一个组合中发现,它大大改进了根据“侵入”指标的可解释性。我们的结论是,浸渍可能有利于关于形态丰富语言的专题模型,但还需要进一步调查。

1

相关内容

话题模型

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

131+阅读 · 2020年5月30日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

90+阅读 · 2020年4月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

52+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

89+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

AINLP

8+阅读 · 2019年6月11日

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE

3+阅读 · 2019年4月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

15+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

8+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Generating Fact Checking Explanations

Generating Fact Checking Explanations

Arxiv

9+阅读 · 2020年4月13日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

Psychological State in Text: A Limitation of Sentiment Analysis

Arxiv

8+阅读 · 2018年6月3日

Improving Character-based Decoding Using Target-Side Morphological Information for Neural Machine Translation

Arxiv

3+阅读 · 2018年4月17日

Analyzing Uncertainty in Neural Machine Translation

Arxiv

6+阅读 · 2018年2月28日

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

Arxiv

4+阅读 · 2018年2月27日

Topic Compositional Neural Language Model

Arxiv

5+阅读 · 2017年12月29日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

131+阅读 · 2020年5月30日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

90+阅读 · 2020年4月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

52+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

89+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

NLP学习新资料：旧金山大学2019夏季自然语言处理课程

AINLP

8+阅读 · 2019年6月11日

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE Webinar 19-10期 Connecting Vision and Language to Action

VALSE

3+阅读 · 2019年4月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

15+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

8+阅读 · 2017年11月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Generating Fact Checking Explanations

Generating Fact Checking Explanations

Arxiv

9+阅读 · 2020年4月13日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

A Probe into Understanding GAN and VAE models

A Probe into Understanding GAN and VAE models

Arxiv

9+阅读 · 2018年12月13日

Psychological State in Text: A Limitation of Sentiment Analysis

Arxiv

8+阅读 · 2018年6月3日

Improving Character-based Decoding Using Target-Side Morphological Information for Neural Machine Translation

Arxiv

3+阅读 · 2018年4月17日

Analyzing Uncertainty in Neural Machine Translation

Arxiv

6+阅读 · 2018年2月28日

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

Arxiv

4+阅读 · 2018年2月27日

Topic Compositional Neural Language Model

Arxiv

5+阅读 · 2017年12月29日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

微信扫码咨询专知VIP会员