1) 【基于BERT的文本生成】Pretraining-Based Natural Language Generation for Text Summarization
4）硕博论文 | 基于知识库的自然语言理解 04#
Neural text classification models typically treat output labels as categorical variables which lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model which generalizes over previous such models, addresses their limitations and does not compromise performance on seen labels. The model consists of a joint non-linear input-label embedding with controllable capacity and a joint-space-dependent classification unit which is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models which do not leverage label semantics and previous joint input-label space models in both scenarios.
Rapidly developed neural models have achieved competitive performance in Chinese word segmentation (CWS) as their traditional counterparts. However, most of methods encounter the computational inefficiency especially for long sentences because of the increasing model complexity and slower decoders. This paper presents a simple neural segmenter which directly labels the gap existence between adjacent characters to alleviate the existing drawback. Our segmenter is fully end-to-end and capable of performing segmentation very fast. We also show a performance difference with different tag sets. The experiments show that our segmenter can provide comparable performance with state-of-the-art.
Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose ConsSent, a simple yet surprisingly powerful unsupervised method to learn such representations by enforcing consistency constraints on sequences of tokens. We consider two classes of such constraints -- sequences that form a sentence and between two sequences that form a sentence when merged. We learn sentence encoders by training them to distinguish between consistent and inconsistent examples, the latter being generated by randomly perturbing consistent examples in six different ways. Extensive evaluation on several transfer learning and linguistic probing tasks shows improved performance over strong unsupervised and supervised baselines, substantially surpassing them in several cases. Our best results are achieved by training sentence encoders in a multitask setting and by an ensemble of encoders trained on the individual tasks.
Large-scale probabilistic representations, including statistical knowledge bases and graphical models, are increasingly in demand. They are built by mining massive sources of structured and unstructured data, the latter often derived from natural language processing techniques. The very nature of the enterprise makes the extracted representations probabilistic. In particular, inducing relations and facts from noisy and incomplete sources via statistical machine learning models means that the labels are either already probabilistic, or that probabilities approximate confidence. While the progress is impressive, extracted representations essentially enforce the closed-world assumption, which means that all facts in the database are accorded the corresponding probability, but all other facts have probability zero. The CWA is deeply problematic in most machine learning contexts. A principled solution is needed for representing incomplete and indeterminate knowledge in such models, imprecise probability models such as credal networks being an example. In this work, we are interested in the foundational problem of learning such open-world probabilistic models. However, since exact inference in probabilistic graphical models is intractable, the paradigm of tractable learning has emerged to learn data structures (such as arithmetic circuits) that support efficient probabilistic querying. We show here how the computational machinery underlying tractable learning has to be generalized for imprecise probabilities. Our empirical evaluations demonstrate that our regime is also effective.
The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.
Natural Language Inference (NLI) is a fundamental and challenging task in Natural Language Processing (NLP). Most existing methods only apply one-pass inference process on a mixed matching feature, which is a concatenation of different matching features between a premise and a hypothesis. In this paper, we propose a new model called Multi-turn Inference Matching Network (MIMN) to perform multi-turn inference on different matching features. In each turn, the model focuses on one particular matching feature instead of the mixed matching feature. To enhance the interaction between different matching features, a memory component is employed to store the history inference information. The inference of each turn is performed on the current matching feature and the memory. We conduct experiments on three different NLI datasets. The experimental results show that our model outperforms or achieves the state-of-the-art performance on all the three datasets.
This paper proposes a variational self-attention model (VSAM) that employs variational inference to derive self-attention. We model the self-attention vector as random variables by imposing a probabilistic distribution. The self-attention mechanism summarizes source information as an attention vector by weighted sum, where the weights are a learned probabilistic distribution. Compared with conventional deterministic counterpart, the stochastic units incorporated by VSAM allow multi-modal attention distributions. Furthermore, by marginalizing over the latent variables, VSAM is more robust against overfitting. Experiments on the stance detection task demonstrate the superiority of our method.
The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora. The core framework is comprised of a shared encoder and a pair of attentional-decoders that gains knowledge of both text simplification and complexification through discriminator-based-losses, back-translation and denoising. The framework is trained using unlabeled text collected from en-Wikipedia dump. Our analysis (both quantitative and qualitative involving human evaluators) on a public test data shows the efficacy of our model to perform simplification at both lexical and syntactic levels, competitive to existing supervised methods. We open source our implementation for academic use.