利用主题模式和情感强度对不专业和文学表达方式进行分类 (Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions) - 专知论文

会员服务 ·

0

话题模型 · 话题 · 潜在狄利克雷分配 · 聚类方法 · 异常点 ·

2018 年 2 月 27 日

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

翻译：利用主题模式和情感强度对不专业和文学表达方式进行分类

Jing Peng,Anna Feldman,Ekaterina Vylomova

from arxiv, EMNLP 2014

We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are highranking representatives of a common topic of discussion are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms occur, typically, are more affective and therefore, we incorporate a simple analysis of the intensity of the emotions expressed by the contexts. We investigate the bag of words topic representation of one to three paragraphs containing an expression that should be classified as idiomatic or literal (a target phrase). We extract topics from paragraphs containing idioms and from paragraphs containing literals using an unsupervised clustering method, Latent Dirichlet Allocation (LDA) (Blei et al., 2003). Since idiomatic expressions exhibit the property of non-compositionality, we assume that they usually present different semantics than the words used in the local topic. We treat idioms as semantic outliers, and the identification of a semantic shift as outlier detection. Thus, this topic representation allows us to differentiate idioms from literals using local semantic contexts. Our results are encouraging.

翻译：我们描述一个自动分类单词和字面表达式的算法。我们的出发点是,某文本部分,例如某段落中的词语,作为共同讨论议题的高级代表,不太可能成为单词表达式的一部分。我们另外的假设是,在通常情况下,出现单词的背景更具有感知性,因此,我们对背景所表现的情绪强度进行简单分析。我们调查一至三段的单词表达式,其中包含一个应归类为单词或字面表达式的表达式(一个目标短语)。我们从含有异语的段落和含有公升数的段落中提取专题,使用非监督的组别方法(LDA)(Blei等人,2003年)。由于单词表达式表达式表现出非共性的特点,我们假定它们通常呈现与本地议题中使用的单词不同的语性。我们把异语作为语流处理,并将语义变化的识别为局外探测。因此,我们使用本地语背景的演示结果可以使我们区分。

4

相关内容

话题模型

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

32+阅读 · 2020年5月2日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

22+阅读 · 2020年4月22日

【WWW2020-UIUC】自动主题分类法构建，Automated Topic Taxonomy Construction

【WWW2020-UIUC】自动主题分类法构建，Automated Topic Taxonomy Construction

专知会员服务

39+阅读 · 2020年3月22日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

55+阅读 · 2020年1月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

158+阅读 · 2020年1月16日

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

专知会员服务

39+阅读 · 2019年12月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

已删除

AI掘金志

7+阅读 · 2019年7月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

9+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Extending Machine Language Models toward Human-Level Language Understanding

Extending Machine Language Models toward Human-Level Language Understanding

Arxiv

4+阅读 · 2019年12月12日

Text Classification Algorithms: A Survey

Arxiv

5+阅读 · 2019年4月25日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

Arxiv

4+阅读 · 2018年6月4日

Application of Rényi and Tsallis Entropies to Topic Modeling Optimization

Arxiv

6+阅读 · 2018年2月28日

Learning Topic Models by Neighborhood Aggregation

Arxiv

3+阅读 · 2018年2月22日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

Knowledge-based Word Sense Disambiguation using Topic Models

Arxiv

5+阅读 · 2018年1月5日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

Detecting Curve Text in the Wild: New Dataset and New Solution

Arxiv

4+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

潜在狄利克雷分配

相关VIP内容

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

32+阅读 · 2020年5月2日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

22+阅读 · 2020年4月22日

【WWW2020-UIUC】自动主题分类法构建，Automated Topic Taxonomy Construction

【WWW2020-UIUC】自动主题分类法构建，Automated Topic Taxonomy Construction

专知会员服务

39+阅读 · 2020年3月22日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

55+阅读 · 2020年1月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

158+阅读 · 2020年1月16日

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

【图机器学习论文】图神经网络的逻辑表达性（Logical Expressiveness of Graph Neural Networks）

专知会员服务

39+阅读 · 2019年12月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

已删除

AI掘金志

7+阅读 · 2019年7月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

9+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Extending Machine Language Models toward Human-Level Language Understanding

Extending Machine Language Models toward Human-Level Language Understanding

Arxiv

4+阅读 · 2019年12月12日

Text Classification Algorithms: A Survey

Arxiv

5+阅读 · 2019年4月25日

A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

Arxiv

7+阅读 · 2018年6月12日

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

Arxiv

4+阅读 · 2018年6月4日

Application of Rényi and Tsallis Entropies to Topic Modeling Optimization

Arxiv

6+阅读 · 2018年2月28日

Learning Topic Models by Neighborhood Aggregation

Arxiv

3+阅读 · 2018年2月22日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

Knowledge-based Word Sense Disambiguation using Topic Models

Arxiv

5+阅读 · 2018年1月5日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

Detecting Curve Text in the Wild: New Dataset and New Solution

Arxiv

4+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员