社会媒体的时态变化 (Stylistic Variation in Social Media Part-of-Speech Tagging)

Social media features substantial stylistic variation, raising new challenges for syntactic analysis of online writing. However, this variation is often aligned with author attributes such as age, gender, and geography, as well as more readily-available social network metadata. In this paper, we report new evidence on the link between language and social networks in the task of part-of-speech tagging. We find that tagger error rates are correlated with network structure, with high accuracy in some parts of the network, and lower accuracy elsewhere. As a result, tagger accuracy depends on training from a balanced sample of the network, rather than training on texts from a narrow subcommunity. We also describe our attempts to add robustness to stylistic variation, by building a mixture-of-experts model in which each expert is associated with a region of the social network. While prior work found that similar approaches yield performance improvements in sentiment analysis and entity linking, we were unable to obtain performance improvements in part-of-speech tagging, despite strong evidence for the link between part-of-speech error rates and social network structure.

翻译：社交媒体具有巨大的文体差异,对在线书写进行综合分析提出了新的挑战。然而,这种差异往往与作者的特征如年龄、性别和地理等一致,以及更容易获得的社会网络元数据。在本文中,我们报告了语言和社会网络之间联系的新的证据,以进行部分语音标记。我们发现,调格错误率与网络结构相关,网络的某些部分的准确性很高,其他地方的准确性较低。因此,调格准确性取决于网络的均衡抽样培训,而不是狭小次社区的文本培训。我们还描述了我们试图通过建立一个专家混合模型,使每位专家都与社会网络的一个区域相联系,从而增强文体变化的稳健性。我们以前的工作发现,类似的方法可以改善情绪分析和实体连接,但我们未能在部分语音标记方面获得业绩的改进,尽管有确凿证据表明部分语音错误率和社会网络结构之间存在联系。

相关内容

词性标注

关注 389

词性（part-of-speech）是词汇基本的语法属性，通常也称为词类。词性标注就是在给定句子中判定每个词的语法范畴，确定其词性并加以标注的过程，是中文信息处理面临的重要基础性问题。在语料库语言学中，词性标注（POS标注或PoS标注或POST），也称为语法标注，是将文本（语料库）中的单词标注为与特定词性相对应的过程，[1] 基于其定义和上下文。

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日