We propose a method to learn unsupervised sentence representations in a non-compositional manner based on Generative Latent Optimization. Our approach does not impose any assumptions on how words are to be combined into a sentence representation. We discuss a simple Bag of Words model as well as a variant that models word positions. Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text. Experiments show large improvements over the related Paragraph Vectors. Compared to uSIF, we achieve a relative improvement of 5% when trained on the same data and our method performs competitively to Sent2vec while trained on 30 times less data.
翻译:我们提出一种方法,以非组合方式学习未经监督的句子表达方式,其依据是 " 产生式低端优化 " 。我们的方法并不对如何将单词合并为句子表达方式作任何假设。我们讨论的是简单的单词袋模式以及一个模拟单词立场的变体。两者都经过培训,可以根据潜在代码对句子进行重新编排,而我们的模式可以用来生成文本。实验显示比相关段落矢量有很大的改进。与uSIF相比,我们相对改进了5%,因为我们进行了关于同一数据的培训,我们的方法在对Sent2vec进行了30倍的数据培训的同时,对Sent2vec进行了竞争性的操作。