Work in Computational Affective Science and Computational Social Science explores a wide variety of research questions about people, emotions, behavior, and health. Such work often relies on language data that is first labeled with relevant information, such as the use of emotion words or the age of the speaker. Although many resources and algorithms exist to enable this type of labeling, discovering, accessing, and using them remains a substantial impediment, particularly for practitioners outside of computer science. Here, we present the ABCDE dataset (Affect, Body, Cognition, Demographics, and Emotion), a large-scale collection of over 400 million text utterances drawn from social media, blogs, books, and AI-generated sources. The dataset is annotated with a wide range of features relevant to computational affective and social science. ABCDE facilitates interdisciplinary research across numerous fields, including affective science, cognitive science, the digital humanities, sociology, political science, and computational linguistics.
翻译:计算情感科学与计算社会科学的研究工作探索了关于人类、情绪、行为与健康的广泛研究问题。此类工作通常依赖于首先被标注了相关信息(如情绪词的使用或说话者的年龄)的语言数据。尽管存在许多支持此类标注的资源与算法,但发现、获取和使用它们仍然是一个重大障碍,尤其对于计算机科学领域以外的实践者而言。在此,我们提出了ABCDE数据集(情感、身体、认知、人口统计与情绪),这是一个从社交媒体、博客、书籍及人工智能生成来源中抽取的、包含超过4亿条文本话语的大规模集合。该数据集标注了与计算情感及社会科学相关的广泛特征。ABCDE促进了跨多个领域的交叉学科研究,包括情感科学、认知科学、数字人文、社会学、政治学与计算语言学。