瑞典报纸文章的话题建模: 使用潜在狄利克雷分配方法的案例研究 (Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method) - 专知论文

会员服务 ·

0

潜在狄利克雷分配 · 话题 · 潜在 · LDA · 语言理解 ·

2023 年 4 月 18 日

Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

翻译：瑞典报纸文章的话题建模: 使用潜在狄利克雷分配方法的案例研究

Bernadeta Griciūtė,Lifeng Han,Goran Nenadic

from arxiv, Accepted to International HealthNLP WS @ IEEE-ICHI2023 https://ieeeichi.github.io/ICHI2023/

Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP) that is to facilitate insightful analysis from large documents and datasets, such as a summarisation of main topics and the topic changes. This kind of discovery is getting more popular in real-life applications due to its impact on big data analytics. In this study, from the social-media and healthcare domain, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus. We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021. We hope this work can be an asset for grounding applications of topic modelling and can be inspiring for similar case studies in an era with pandemics, to support socio-economic impact research as well as clinical and healthcare analytics. Our data and source code are openly available at https://github. com/poethan/Swed_Covid_TM Keywords: Latent Dirichlet Allocation (LDA); Topic Modelling; Coronavirus; Pandemics; Natural Language Understanding; BERT-topic

翻译：话题建模(Topic Modelling，TM)来自自然语言理解(NLU)和自然语言处理(NLP)研究领域，旨在从大型文档和数据集中提供深入洞察，例如主题摘要和主题变化。在大数据分析中，这种发现变得越来越受欢迎。在本研究中，我们从社交媒体和医疗保健领域，应用流行的潜在狄利克雷分配(LDA)方法对瑞典报纸关于冠状病毒的文章进行话题建模。我们描述了我们创建的语料库，其中包括6515篇文章，应用的方法，以及关于话题变化的统计数据，跨越了从2020年1月17日到2021年3月13日约1年2个月的时间段。我们希望这项工作可以成为话题建模应用的资产，并且可以启发类似情况下的案例研究，以支持社会经济影响研究以及临床和医疗保健分析。我们的数据和源代码在https://github .com/poethan/Swed_Covid_TM公开可用。关键词:潜在狄利克雷分配(LDA)；话题建模；冠状病毒；大流行病；自然语言理解；BERT-topic

0

相关内容

潜在狄利克雷分配

潜在狄利克雷分配

39页PPT！马普所Gerhard Weikum介绍知识图谱历史、教训、挑战、机会【Knowledge Graphs 2021: A Data Odyssey】

39页PPT！马普所Gerhard Weikum介绍知识图谱历史、教训、挑战、机会【Knowledge Graphs 2021: A Data Odyssey】

专知会员服务

20+阅读 · 2022年2月25日

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

专知会员服务

44+阅读 · 2021年4月29日

2020数据工程师成长路线图

专知会员服务

38+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

123+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

91+阅读 · 2020年4月18日

【知识图谱简史】A Brief History of Knowledge Graph's Main Ideas: A tutorial

【知识图谱简史】A Brief History of Knowledge Graph's Main Ideas: A tutorial

专知会员服务

72+阅读 · 2019年12月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

43+阅读 · 2019年11月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

100+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

11+阅读 · 2018年11月1日

【推荐】深度学习情感分析综述

【推荐】深度学习情感分析综述

机器学习研究会

58+阅读 · 2018年1月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

福利 | 最全面超大规模数据集下载链接汇总

福利 | 最全面超大规模数据集下载链接汇总

AI研习社

25+阅读 · 2017年9月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

基于界面分形特征的TiAl合金电解加工表面质量控制模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

全球海洋热含量估计中的Mapping方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

绵马贯众间苯三酚类化合物黄绵马酸AB抑制A型流感病毒复制的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Wiki资源的中英文跨语言本体知识库构建

国家自然科学基金

1+阅读 · 2012年12月31日

大规模概率数据的管理与查询优化

国家自然科学基金

0+阅读 · 2012年12月31日

多元整数值GARCH模型的统计分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于碳排放计量分析的低碳工业园区规划控制机理与方法

国家自然科学基金

0+阅读 · 2012年12月31日

风力发电机组齿轮箱混杂故障智能综合辨识与复合诊断研究

国家自然科学基金

0+阅读 · 2012年12月31日

光老化皮肤CatG、MMPS对TGF-β/Smad通路的调控及交互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

Exploring Turn Signal Usage Patterns in Lane Changes: A Bayesian Hierarchical Modelling Analysis of Realistic Driving Data

Arxiv

0+阅读 · 2023年6月2日

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Arxiv

0+阅读 · 2023年6月2日

An Empirical Study on Challenging Math Problem Solving with GPT-4

Arxiv

0+阅读 · 2023年6月2日

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

Arxiv

0+阅读 · 2023年6月1日

Assessing Word Importance Using Models Trained for Semantic Tasks

Arxiv

0+阅读 · 2023年5月31日

Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

Arxiv

0+阅读 · 2023年5月31日

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

Arxiv

24+阅读 · 2022年2月4日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

A Primer in BERTology: What we know about how BERT works

A Primer in BERTology: What we know about how BERT works

Arxiv

33+阅读 · 2020年2月27日

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Arxiv

16+阅读 · 2020年1月2日

VIP会员

文章信息

相关主题

潜在狄利克雷分配

相关VIP内容

39页PPT！马普所Gerhard Weikum介绍知识图谱历史、教训、挑战、机会【Knowledge Graphs 2021: A Data Odyssey】

39页PPT！马普所Gerhard Weikum介绍知识图谱历史、教训、挑战、机会【Knowledge Graphs 2021: A Data Odyssey】

专知会员服务

20+阅读 · 2022年2月25日

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

专知会员服务

44+阅读 · 2021年4月29日

2020数据工程师成长路线图

专知会员服务

38+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

123+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

91+阅读 · 2020年4月18日

【知识图谱简史】A Brief History of Knowledge Graph's Main Ideas: A tutorial

【知识图谱简史】A Brief History of Knowledge Graph's Main Ideas: A tutorial

专知会员服务

72+阅读 · 2019年12月2日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

43+阅读 · 2019年11月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

100+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

11+阅读 · 2018年11月1日

【推荐】深度学习情感分析综述

【推荐】深度学习情感分析综述

机器学习研究会

58+阅读 · 2018年1月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

福利 | 最全面超大规模数据集下载链接汇总

福利 | 最全面超大规模数据集下载链接汇总

AI研习社

25+阅读 · 2017年9月7日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Exploring Turn Signal Usage Patterns in Lane Changes: A Bayesian Hierarchical Modelling Analysis of Realistic Driving Data

Arxiv

0+阅读 · 2023年6月2日

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Arxiv

0+阅读 · 2023年6月2日

An Empirical Study on Challenging Math Problem Solving with GPT-4

Arxiv

0+阅读 · 2023年6月2日

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

Arxiv

0+阅读 · 2023年6月1日

Assessing Word Importance Using Models Trained for Semantic Tasks

Arxiv

0+阅读 · 2023年5月31日

Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

Arxiv

0+阅读 · 2023年5月31日

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

Arxiv

24+阅读 · 2022年2月4日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

A Primer in BERTology: What we know about how BERT works

A Primer in BERTology: What we know about how BERT works

Arxiv

33+阅读 · 2020年2月27日

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Arxiv

16+阅读 · 2020年1月2日

相关基金

基于界面分形特征的TiAl合金电解加工表面质量控制模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

全球海洋热含量估计中的Mapping方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

绵马贯众间苯三酚类化合物黄绵马酸AB抑制A型流感病毒复制的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Wiki资源的中英文跨语言本体知识库构建

国家自然科学基金

1+阅读 · 2012年12月31日

大规模概率数据的管理与查询优化

国家自然科学基金

0+阅读 · 2012年12月31日

多元整数值GARCH模型的统计分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于碳排放计量分析的低碳工业园区规划控制机理与方法

国家自然科学基金

0+阅读 · 2012年12月31日

风力发电机组齿轮箱混杂故障智能综合辨识与复合诊断研究

国家自然科学基金

0+阅读 · 2012年12月31日

光老化皮肤CatG、MMPS对TGF-β/Smad通路的调控及交互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员