This paper takes a step towards theoretical analysis of the relationship between word embeddings and context embeddings in models such as word2vec. We start from basic probabilistic assumptions on the nature of word vectors, context vectors, and text generation. These assumptions are well supported either empirically or theoretically by the existing literature. Next, we show that under these assumptions the widely-used word-word PMI matrix is approximately a random symmetric Gaussian ensemble. This, in turn, implies that context vectors are reflections of word vectors in approximately half the dimensions. As a direct application of our result, we suggest a theoretically grounded way of tying weights in the SGNS model.
Rapidly developed neural models have achieved competitive performance in Chinese word segmentation (CWS) as their traditional counterparts. However, most of methods encounter the computational inefficiency especially for long sentences because of the increasing model complexity and slower decoders. This paper presents a simple neural segmenter which directly labels the gap existence between adjacent characters to alleviate the existing drawback. Our segmenter is fully end-to-end and capable of performing segmentation very fast. We also show a performance difference with different tag sets. The experiments show that our segmenter can provide comparable performance with state-of-the-art.
Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03% of training set size) we can improve generalization performance over several baselines.
The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.