松散地解除高温矢量:统一单词和句子代表 (Sparse Lifting of Dense Vectors: Unifying Word and Sentence Representations) - 专知论文

会员服务 ·

0

向量化 · 稀疏 · Automator · 词向量 · MoDELS ·

2019 年 11 月 5 日

Sparse Lifting of Dense Vectors: Unifying Word and Sentence Representations

翻译：松散地解除高温矢量:统一单词和句子代表

Wenye Li,Senyue Hao

from arxiv, 11 pages, 4 figures

As the first step in automated natural language processing, representing words and sentences is of central importance and has attracted significant research attention. Different approaches, from the early one-hot and bag-of-words representation to more recent distributional dense and sparse representations, were proposed. Despite the successful results that have been achieved, such vectors tend to consist of uninterpretable components and face nontrivial challenge in both memory and computational requirement in practical applications. In this paper, we designed a novel representation model that projects dense word vectors into a higher dimensional space and favors a highly sparse and binary representation of word vectors with potentially interpretable components, while trying to maintain pairwise inner products between original vectors as much as possible. Computationally, our model is relaxed as a symmetric non-negative matrix factorization problem which admits a fast yet effective solution. In a series of empirical evaluations, the proposed model exhibited consistent improvement and high potential in practical applications.

翻译：作为自动自然语言处理的第一步,代表文字和句子具有核心重要性,并引起重要的研究关注。提出了从早期一热和一袋字表示到最近分布密度和稀少的表示方式的不同方法。尽管取得了成功的结果,这种矢量往往包括无法解释的组成部分,在实际应用的记忆和计算要求方面面临着非三进制的挑战。在本文件中,我们设计了一个新型的表示模式,将密集的文字矢量投射到一个更高的维度空间,有利于高度稀少和二进制的、具有潜在可解释组成部分的文字矢量的表示方式,同时尽量在原始矢量之间保持对称的内产品。据计算,我们的模式是一个对称的非负矩阵因子化问题,它承认一种快速有效的解决办法。在一系列经验评估中,拟议的模型在实际应用方面表现出了持续的改进和高度的潜力。

0

相关内容

向量化

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

34+阅读 · 2020年5月10日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

21+阅读 · 2020年4月21日

所有好的向量空间都是同构的吗?Are All Good Word Vector Spaces Isomorphic?

所有好的向量空间都是同构的吗?Are All Good Word Vector Spaces Isomorphic?

专知会员服务

8+阅读 · 2020年4月12日

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

专知会员服务

17+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation

Arxiv

3+阅读 · 2019年6月24日

Learning Visually Grounded Sentence Representations

Arxiv

5+阅读 · 2018年6月4日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

Image Segmentation Using Subspace Representation and Sparse Decomposition

Arxiv

6+阅读 · 2018年4月6日

Deep contextualized word representations

Arxiv

10+阅读 · 2018年3月22日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

3+阅读 · 2018年2月28日

ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System

Arxiv

9+阅读 · 2018年1月10日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

VIP会员

文章信息

相关主题

相关VIP内容

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

34+阅读 · 2020年5月10日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

21+阅读 · 2020年4月21日

所有好的向量空间都是同构的吗?Are All Good Word Vector Spaces Isomorphic?

所有好的向量空间都是同构的吗?Are All Good Word Vector Spaces Isomorphic?

专知会员服务

8+阅读 · 2020年4月12日

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

专知会员服务

17+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

自然语言处理 (三)　之　word embedding

自然语言处理 (三)　之　word embedding

DeepLearning中文论坛

19+阅读 · 2015年8月3日

相关论文

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation

Arxiv

3+阅读 · 2019年6月24日

Learning Visually Grounded Sentence Representations

Arxiv

5+阅读 · 2018年6月4日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

Self-Attention with Relative Position Representations

Arxiv

27+阅读 · 2018年4月12日

Image Segmentation Using Subspace Representation and Sparse Decomposition

Arxiv

6+阅读 · 2018年4月6日

Deep contextualized word representations

Arxiv

10+阅读 · 2018年3月22日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

3+阅读 · 2018年2月28日

ParVecMF: A Paragraph Vector-based Matrix Factorization Recommender System

Arxiv

9+阅读 · 2018年1月10日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

微信扫码咨询专知VIP会员