推特上各人口群体语言多样性 (Linguistic Diversities of Demographic Groups in Twitter)

The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space.

翻译：在线社交媒体的大规模普及为研究人员研究用户互动的语言特征和模式提供了一个独特的机会。在本文中,我们提供了对Twitter中各人口群体语言使用情况的深入描述,特别是利用Face++的高级图像处理算法抽取美国Twitter用户的性别和种族。然后,我们调查人口群体(即男性/女性、亚裔/黑人/白人)在语言风格和兴趣方面有何差异。我们从6个类别(情感特征、认知特征、词汇密度和认识、时间参照、社会和个人关切以及人际焦点)中提取语言特征,以便确定相似性和差异,特别是书面属性集。此外,我们从人口群体中提取顶级词的绝对分级差异。作为多样性的一个方面,我们还使用我们从每个用户获取的兴趣主题。我们的分析揭示了不同人口群体在写作风格(和兴趣专题)上的明显差异,在性别和种族方面都有差异。我们希望我们的努力能够刺激与在线空间人口信息相关的新研究的发展。

相关内容

GROUP

关注 0

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

36+阅读 · 2020年5月4日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

61+阅读 · 2020年2月17日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

19+阅读 · 2019年11月11日