Distributed word representations are widely used for modeling words in NLP tasks. Most of the existing models generate one representation per word and do not consider different meanings of a word. We present two approaches to learn multiple topic-sensitive representations per word by using Hierarchical Dirichlet Process. We observe that by modeling topics and integrating topic distributions for each document we obtain representations that are able to distinguish between different meanings of a given word. Our models yield statistically significant improvements for the lexical substitution task indicating that commonly used single word representations, even when combined with contextual information, are insufficient for this task.
翻译:在国家语言方案任务中,分布式文字表示法被广泛用于模拟词汇。大多数现有模型产生一个单词表示法,不考虑一个单词的不同含义。我们提出两种方法,通过使用等级式二分立进程来学习每个单词的多专题敏感表示法。我们注意到,通过对每个文件进行主题模拟和综合专题分发法,我们得到了能够区分一个单词不同含义的表述法。我们的模型在统计学上对替换法任务作了重大改进,表明通常使用的单字表示法,即使结合了背景信息,也不足以完成这项任务。