【AI日报】2019-03-01 星期五

3 月 1 日 好东西传送门

【机器学习】 

1)侯世达:让机器学习思考的人

https://mp.weixin.qq.com/s/ZGOe6dJ9gb7tXogDit7Xmw

2)F-Principle:初探深度学习在计算数学的应用

https://mp.weixin.qq.com/s/yXGuoBPmIeA3SteD_TliXg


【新技术与新应用】 

1)为什么说互联网公司是养鸡模式,AI公司是养小孩儿模式

https://mp.weixin.qq.com/s/64PjHAi7yzqmNciJKFmylQ

2)新技术到底靠不靠谱?在中国用一下就知道了

https://mp.weixin.qq.com/s/jzCJZ9mfTCgdYfwEygNhAg

3)网易伏羲AI实验室负责人李仁杰:人工智能在游戏中的赋能与落地

https://mp.weixin.qq.com/s/1FJcKskI2LPhwhIREVtlCg


【Fintech】 

1)见微数据:公告搜索利器 

https://dwz.cn/WUh2njfl

2)农行研发中心总经理:数字化转型,科技研发如何突破? 

https://dwz.cn/8YGsovUU

3)22家区域性银行金融科技战略研究:认知、路径与场景 

https://dwz.cn/Fw4ayWI3 


【自然语言处理】 

1) 【基于BERT的文本生成】Pretraining-Based Natural Language Generation for Text Summarization 

http://www.weibo.com/2678093863/HiHJibA4n 

2) 文本数据集构建工具库,抓取、清理网页并去重,用以创建大规模单语数据集 

http://www.weibo.com/1402400261/HiHSweAHK 

3) Spacy/Gensim/Textacy主题建模实例 

http://www.weibo.com/1402400261/HiHPjF9VM 

4)硕博论文 | 基于知识库的自然语言理解 04#

https://mp.weixin.qq.com/s/hBcsPcs1z9GyYK2RoJeCFg

5)微信AI拿下NLP竞赛全球冠军

https://mp.weixin.qq.com/s/Jnp6jmy-8lloI7p4dAofKg


点赞 0

This introduction aims to tell the story of how we put words into computers. It is part of the story of the field of natural language processing (NLP), a branch of artificial intelligence. It targets a wide audience with a basic understanding of computer programming, but avoids a detailed mathematical treatment, and it does not present any algorithms. It also does not focus on any particular application of NLP such as translation, question answering, or information extraction. The ideas presented here were developed by many researchers over many decades, so the citations are not exhaustive but rather direct the reader to a handful of papers that are, in the author's view, seminal. After reading this document, you should have a general understanding of word vectors (also known as word embeddings): why they exist, what problems they solve, where they come from, how they have changed over time, and what some of the open questions about them are. Readers already familiar with word vectors are advised to skip to Section 5 for the discussion of the most recent advance, contextual word vectors.

点赞 0
阅读1+

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient than self-attention. We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic. Experiments on large-scale machine translation, language modeling and abstractive summarization show that dynamic convolutions improve over strong self-attention models. On the WMT'14 English-German test set dynamic convolutions achieve a new state of the art of 29.7 BLEU.

点赞 0
阅读1+

Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03% of training set size) we can improve generalization performance over several baselines.

点赞 0
阅读1+

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships---a convenient benchmark used for evaluation in previous work---appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

点赞 0
阅读1+

This paper proposes a variational self-attention model (VSAM) that employs variational inference to derive self-attention. We model the self-attention vector as random variables by imposing a probabilistic distribution. The self-attention mechanism summarizes source information as an attention vector by weighted sum, where the weights are a learned probabilistic distribution. Compared with conventional deterministic counterpart, the stochastic units incorporated by VSAM allow multi-modal attention distributions. Furthermore, by marginalizing over the latent variables, VSAM is more robust against overfitting. Experiments on the stance detection task demonstrate the superiority of our method.

点赞 0
阅读1+

Extracting appropriate features to represent a corpus is an important task for textual mining. Previous attention based work usually enhance feature at the lexical level, which lacks the exploration of feature augmentation at the sentence level. In this paper, we exploit a Dynamic Feature Generation Network (DFGN) to solve this problem. Specifically, DFGN generates features based on a variety of attention mechanisms and attaches features to sentence representation. Then a thresholder is designed to filter the mined features automatically. DFGN extracts the most significant characteristics from datasets to keep its practicability and robustness. Experimental results on multiple well-known answer selection datasets show that our proposed approach significantly outperforms state-of-the-art baselines. We give a detailed analysis of the experiments to illustrate why DFGN provides excellent retrieval and interpretative ability.

点赞 0
阅读1+

Deep neural networks and decision trees operate on largely separate paradigms; typically, the former performs representation learning with pre-specified architectures, while the latter is characterised by learning hierarchies over pre-specified features with data-driven architectures. We unite the two via adaptive neural trees (ANTs), a model that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers). ANTs allow increased interpretability via hierarchical clustering, e.g., learning meaningful class associations, such as separating natural vs. man-made objects. We demonstrate this on classification and regression tasks, achieving over 99% and 90% accuracy on the MNIST and CIFAR-10 datasets, and outperforming standard neural networks, random forests and gradient boosted trees on the SARCOS dataset. Furthermore, ANT optimisation naturally adapts the architecture to the size and complexity of the training data.

点赞 0
阅读1+
Top