Learning to construct text representations in end-to-end systems can be difficult, as natural languages are highly compositional and task-specific annotated datasets are often limited in size. Methods for directly supervising language composition can allow us to guide the models based on existing knowledge, regularizing them towards more robust and interpretable representations. In this paper, we investigate how objectives at different granularities can be used to learn better language representations and we propose an architecture for jointly learning to label sentences and tokens. The predictions at each level are combined together using an attention mechanism, with token-level labels also acting as explicit supervision for composing sentence-level representations. Our experiments show that by learning to perform these tasks jointly on multiple levels, the model achieves substantial improvements for both sentence classification and sequence labeling.

### 相关内容

Many tasks in natural language processing can be viewed as multi-label classification problems. However, most of the existing models are trained with the standard cross-entropy loss function and use a fixed prediction policy (e.g., a threshold of 0.5) for all the labels, which completely ignores the complexity and dependencies among different labels. In this paper, we propose a meta-learning method to capture these complex label dependencies. More specifically, our method utilizes a meta-learner to jointly learn the training policies and prediction policies for different labels. The training policies are then used to train the classifier with the cross-entropy loss function, and the prediction policies are further implemented for prediction. Experimental results on fine-grained entity typing and text classification demonstrate that our proposed method can obtain more accurate multi-label classification results.

CRF has been used as a powerful model for statistical sequence labeling. For neural sequence labeling, however, BiLSTM-CRF does not always lead to better results compared with BiLSTM-softmax local classification. This can be because the simple Markov label transition model of CRF does not give much information gain over strong neural encoding. For better representing label sequences, we investigate a hierarchically-refined label attention network, which explicitly leverages label embeddings and captures potential long-term label dependency by giving each word incrementally refined label distributions with hierarchical attention. Results on POS tagging, NER and CCG supertagging show that the proposed model not only improves the overall tagging accuracy with similar number of parameters, but also significantly speeds up the training and testing compared to BiLSTM-CRF.

In recent years, the sequence-to-sequence learning neural networks with attention mechanism have achieved great progress. However, there are still challenges, especially for Neural Machine Translation (NMT), such as lower translation quality on long sentences. In this paper, we present a hierarchical deep neural network architecture to improve the quality of long sentences translation. The proposed network embeds sequence-to-sequence neural networks into a two-level category hierarchy by following the coarse-to-fine paradigm. Long sentences are input by splitting them into shorter sequences, which can be well processed by the coarse category network as the long distance dependencies for short sentences is able to be handled by network based on sequence-to-sequence neural network. Then they are concatenated and corrected by the fine category network. The experiments shows that our method can achieve superior results with higher BLEU(Bilingual Evaluation Understudy) scores, lower perplexity and better performance in imitating expression style and words usage than the traditional networks.

This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionary-guided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to $3.8$ % BLEU improvement when using the dictionary, in comparison to $2.4$ % BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of $1.8$ without loss in translation performance. We carry out experiments on the shared WMT 2016 English$\to$Romanian news task and the BOLT Chinese$\to$English discussion forum task.

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.

There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losses can perform surprisingly well by slightly outperforming beam search optimization in a like for like setup. We also report new state of the art results on both IWSLT'14 German-English translation as well as Gigaword abstractive summarization. On the larger WMT'14 English-French translation task, sequence-level training achieves 41.5 BLEU which is on par with the state of the art.

We combine multi-task learning and semi-supervised learning by inducing a joint embedding space between disparate label spaces and learning transfer functions between label embeddings, enabling us to jointly leverage unlabelled data and auxiliary, annotated datasets. We evaluate our approach on a variety of sequence classification tasks with disparate label spaces. We outperform strong single and multi-task baselines and achieve a new state-of-the-art for topic-based sentiment analysis.

Homographs, words with different meanings but the same surface form, have long caused difficulty for machine translation systems, as it is difficult to select the correct translation based on the context. However, with the advent of neural machine translation (NMT) systems, which can theoretically take into account global sentential context, one may hypothesize that this problem has been alleviated. In this paper, we first provide empirical evidence that existing NMT systems in fact still have significant problems in properly translating ambiguous words. We then proceed to describe methods, inspired by the word sense disambiguation literature, that model the context of the input word with context-aware word embeddings that help to differentiate the word sense be- fore feeding it into the encoder. Experiments on three language pairs demonstrate that such models improve the performance of NMT systems both in terms of BLEU score and in the accuracy of translating homographs.

Most previous event extraction studies have relied heavily on features derived from annotated event mentions, thus cannot be applied to new event types without annotation effort. In this work, we take a fresh look at event extraction and model it as a grounding problem. We design a transferable neural architecture, mapping event mentions and types jointly into a shared semantic space using structural and compositional neural networks, where the type of each event mention can be determined by the closest of all candidate types . By leveraging (1)~available manual annotations for a small set of existing event types and (2)~existing event ontologies, our framework applies to new event types without requiring additional annotation. Experiments on both existing event types (e.g., ACE, ERE) and new event types (e.g., FrameNet) demonstrate the effectiveness of our approach. \textit{Without any manual annotations} for 23 new event types, our zero-shot framework achieved performance comparable to a state-of-the-art supervised model which is trained from the annotations of 500 event mentions.

Jiawei Wu,Wenhan Xiong,William Yang Wang
14+阅读 · 2019年9月9日
Leyang Cui,Yue Zhang
3+阅读 · 2019年8月23日
Si Zuo,Zhimin Xu
3+阅读 · 2018年11月23日
Tamer Alkhouli,Gabriel Bretschner,Hermann Ney
3+阅读 · 2018年9月11日
12+阅读 · 2018年6月23日
Sergey Edunov,Myle Ott,Michael Auli,David Grangier,Marc'Aurelio Ranzato
5+阅读 · 2018年5月24日
Isabelle Augenstein,Sebastian Ruder,Anders Søgaard
3+阅读 · 2018年4月9日
Shikun Liu,Edward Johns,Andrew J. Davison
12+阅读 · 2018年3月28日
Frederick Liu,Han Lu,Graham Neubig
3+阅读 · 2018年3月28日
Lifu Huang,Heng Ji,Kyunghyun Cho,Clare R. Voss
10+阅读 · 2017年7月4日

28+阅读 · 2020年5月9日

60+阅读 · 2020年3月18日

43+阅读 · 2019年10月12日

29+阅读 · 2019年10月10日

28+阅读 · 2019年10月10日

CreateAMind
9+阅读 · 2019年5月22日
CreateAMind
6+阅读 · 2019年5月18日
CreateAMind
20+阅读 · 2019年1月4日
CreateAMind
26+阅读 · 2019年1月3日
CreateAMind
20+阅读 · 2018年9月12日
CreateAMind
15+阅读 · 2018年5月25日

7+阅读 · 2018年5月4日
CreateAMind
9+阅读 · 2018年4月27日
Top