每日论文 | 谷歌新模型BERT刷新多项NLP任务成绩;三大概率模型详解;另辟蹊径解决多任务学习

2018 年 10 月 12 日 论智

1

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

昨日,谷歌发表论文称,他们提出了一种新的语言表示模型BERT(Bidirectional Encoder Representations from Transformers)。和最近的语言表示模型不同,BERT通过考虑所有层的左右两部分语境,对深度双向表示进行预训练。BERT的结构非常简单,但实用性很强,它在11种NLP任务中都获得了最优成绩,比如在GLUE上的得分达到80.4%(提高了7.6%),在MultiNLI上的精确度达到了86.7%(提高了5.6%),在SQuAD v1.1问答测试F1中达到了93.2分,比人类最高水平还高2.0%。

地址:https://arxiv.org/abs/1810.04805

2

A Tale of Three Probabilistic Families: Discriminative, Descriptive and Generative Models

格雷南德的模式理论(patter theory)是一个数学框架,其模式由代数结构随机变量的概率模型表示。在本文中,我们回顾了概率模型的“三大家族”,即判别模型、描述模型和生成模型。我们将在通用框架内浏览这些模型,并探索它们之间的联系。

地址:https://arxiv.org/abs/1810.04261

3

Multi-Task Learning as Multi-Objective Optimization

在多任务学习中,各任务通常是联合解决的,它们之间通常有相同的归纳偏差。多任务学习的本质是多目标问题,因为不同的人物之间可能会冲突,所以需要进行权衡。常见的折中方案是优化代理目标,最小化每个人物损失的加权线性组合。但是这种方法只有在简单任务中才能有效。在本文中,我们将多任务学习看作多目标优化,其总体目标就是找到帕累托最优解。最终证明,我们的方法在最近的多任务学习问题上的表现达到了最优。

地址:https://arxiv.org/abs/1810.04650

登录查看更多
点赞 0

Learned data models based on sparsity are widely used in signal processing and imaging applications. A variety of methods for learning synthesis dictionaries, sparsifying transforms, etc., have been proposed in recent years, often imposing useful structures or properties on the models. In this work, we focus on sparsifying transform learning, which enjoys a number of advantages. We consider multi-layer or nested extensions of the transform model, and propose efficient learning algorithms. Numerical experiments with image data illustrate the behavior of the multi-layer transform learning algorithm and its usefulness for image denoising. Multi-layer models provide better denoising quality than single layer schemes.

点赞 0
阅读1+

Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, [Crannell et. al.], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing their experience with this disease. We refer to these online public testimonies as "Invisible Patient Reported Outcomes" (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-report. Methods: Our present study aims to identify tweets related to the patient experience as an additional informative tool for monitoring public health. Using Twitter's public streaming API, we compiled over 5.3 million "breast cancer" related tweets spanning September 2016 until mid December 2017. We combined supervised machine learning methods with natural language processing to sift tweets relevant to breast cancer patient experiences. We analyzed a sample of 845 breast cancer patient and survivor accounts, responsible for over 48,000 posts. We investigated tweet content with a hedonometric sentiment analysis to quantitatively extract emotionally charged topics. Results: We found that positive experiences were shared regarding patient treatment, raising support, and spreading awareness. Further discussions related to healthcare were prevalent and largely negative focusing on fear of political legislation that could result in loss of coverage. Conclusions: Social media can provide a positive outlet for patients to discuss their needs and concerns regarding their healthcare coverage and treatment needs. Capturing iPROs from online communication can help inform healthcare professionals and lead to more connected and personalized treatment regimens.

点赞 0
阅读4+

This paper presents the first study aimed at capturing stylistic similarity between words in an unsupervised manner. We propose extending the continuous bag of words (CBOW) model (Mikolov et al., 2013) to learn style-sensitive word vectors using a wider context window under the assumption that the style of all the words in an utterance is consistent. In addition, we introduce a novel task to predict lexical stylistic similarity and to create a benchmark dataset for this task. Our experiment with this dataset supports our assumption and demonstrates that the proposed extensions contribute to the acquisition of style-sensitive word embeddings.

点赞 0
阅读1+

Distributed word representations are widely used for modeling words in NLP tasks. Most of the existing models generate one representation per word and do not consider different meanings of a word. We present two approaches to learn multiple topic-sensitive representations per word by using Hierarchical Dirichlet Process. We observe that by modeling topics and integrating topic distributions for each document we obtain representations that are able to distinguish between different meanings of a given word. Our models yield statistically significant improvements for the lexical substitution task indicating that commonly used single word representations, even when combined with contextual information, are insufficient for this task.

点赞 0
阅读1+

In this paper, we propose a very concise deep learning approach for collaborative filtering that jointly models distributional representation for users and items. The proposed framework obtains better performance when compared against current state-of-art algorithms and that made the distributional representation model a promising direction for further research in the collaborative filtering.

点赞 0
阅读1+
Top