连续词袋模型(CBOW),利用上下文或周围的单词来预测中心词。其输入为某一个特征词的上下文相关对应的词向量(单词的one-hot编码);输出为这特定的一个词的词向量(单词的one-hot编码)。

最新论文

It is a common belief in the NLP community that continuous bag-of-words (CBOW) word embeddings tend to underperform skip-gram (SG) embeddings. We find that this belief is founded less on theoretical differences in their training objectives but more on faulty CBOW implementations in standard software libraries such as the official implementation word2vec.c and Gensim. We show that our correct implementation of CBOW yields word embeddings that are fully competitive with SG on various intrinsic and extrinsic tasks while being more than three times as fast to train. We release our implementation, k\=oan, at https://github.com/bloomberg/koan.

0
0
下载
预览
Top