读取、高亮和摘要:基于等级的神经神经语义语义编码器的方法 (Read, Highlight and Summarize: A Hierarchical Neural Semantic Encoder-based Approach)

Traditional sequence-to-sequence (seq2seq) models and other variations of the attention-mechanism such as hierarchical attention have been applied to the text summarization problem. Though there is a hierarchy in the way humans use language by forming paragraphs from sentences and sentences from words, hierarchical models have usually not worked that much better than their traditional seq2seq counterparts. This effect is mainly because either the hierarchical attention mechanisms are too sparse using hard attention or noisy using soft attention. In this paper, we propose a method based on extracting the highlights of a document; a key concept that is conveyed in a few sentences. In a typical text summarization dataset consisting of documents that are 800 tokens in length (average), capturing long-term dependencies is very important, e.g., the last sentence can be grouped with the first sentence of a document to form a summary. LSTMs (Long Short-Term Memory) proved useful for machine translation. However, they often fail to capture long-term dependencies while modeling long sequences. To address these issues, we have adapted Neural Semantic Encoders (NSE) to text summarization, a class of memory-augmented neural networks by improving its functionalities and proposed a novel hierarchical NSE that outperforms similar previous models significantly. The quality of summarization was improved by augmenting linguistic factors, namely lemma, and Part-of-Speech (PoS) tags, to each word in the dataset for improved vocabulary coverage and generalization. The hierarchical NSE model on factored dataset outperformed the state-of-the-art by nearly 4 ROUGE points. We further designed and used the first GPU-based self-critical Reinforcement Learning model.

翻译：传统的排序到排序模式( seq2seq) 模式和关注机制的其他变异( 如等级注意) 等关键概念被应用到文本总和问题。虽然从句子和句子形成段落后, 人类使用语言的方式存在等级分级问题。等级模式通常不会比传统的后继2seqeq) 效果好得多, 主要是因为等级关注机制太少, 使用硬注意力或使用软注意力吵闹来进行机器翻译。但是, 在本文中, 我们提出一种方法, 方法基于提取文档的亮点; 关键概念在几个句子中传递。在由800个符号( 平均) 组成的典型文本总和排序数据数据集中, 捕捉长期依赖性, 例如, 等级模型通常无法捕捉到长期依赖性; 典型SementalGegrequeme- developmental Enalal- developmentalal- silationservolations, 即Seural- degalal- demoal- demodeal- demodeal- demomental- modeal- demodestral lamental romodal ladal- 4) lagremodal- wemodal- wedddal- weal- wemomental- wemomental- ladal- wedal- wedal- momental- momental- momental- momental- ladddddsaldaldaldddddddddaldal- wegmental- wedddddddddddddddddddddsal- weddsaldaldaldsaldddddddddddddddddddddddddddaldaldaldal- wedal rodal roddddddddddddddddddaldaldal- wedal- weddddddal- wedal- wedal- wedal- wedal- wedal

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。