利用社会媒体文本中的用于采矿区域主义的用户频率信息 (Exploiting user-frequency information for mining regionalisms from Social Media texts) - 专知论文

会员服务 ·

0

INFORMS · MINE · 信息理论 · 特征选择 · CASE ·

2019 年 7 月 10 日

Exploiting user-frequency information for mining regionalisms from Social Media texts

翻译：利用社会媒体文本中的用于采矿区域主义的用户频率信息

Juan Manuel Pérez,Damián E. Aleman,Santiago N. Kalinowski,Agustín Gravano

The task of detecting regionalisms (expressions or words used in certain regions) has traditionally relied on the use of questionnaires and surveys, and has also heavily depended on the expertise and intuition of the surveyor. The irruption of Social Media and its microblogging services has produced an unprecedented wealth of content, mainly informal text generated by users, opening new opportunities for linguists to extend their studies of language variation. Previous work on automatic detection of regionalisms depended mostly on word frequencies. In this work, we present a novel metric based on Information Theory that incorporates user frequency. We tested this metric on a corpus of Argentinian Spanish tweets in two ways: via manual annotation of the relevance of the retrieved terms, and also as a feature selection method for geolocation of users. In either case, our metric outperformed other techniques based solely in word frequency, suggesting that measuring the amount of users that produce a word is informative. This tool has helped lexicographers discover several unregistered words of Argentinian Spanish, as well as different meanings assigned to registered words.

翻译：发现区域主义(在某些地区使用的表达或词词)的任务传统上依赖于问卷和调查的使用,也在很大程度上依赖测量员的专门知识和直觉。社会媒体及其微博客服务的破坏产生了前所未有的大量内容,主要是用户产生的非正式文本,为语言学家提供了新的机会,以扩大语言差异的研究。以前自动发现区域主义的工作主要取决于文字频率。在这项工作中,我们根据信息理论提出了一个包含用户频率的新指标。我们用两种方式在阿根廷西班牙语推文中测试了这一指标:人工说明检索到的术语的相关性,同时也作为用户地理位置的特征选择方法。在这两种情况下,我们的衡量标准都超越了仅以文字频率为基础的其他技术,表明衡量生成一个词的用户的数量是信息性的。这一工具帮助词汇学家发现了阿根廷西班牙语的几种未注册词,以及对登记词的不同含义。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【KDD2019|讲座推荐】社会用户兴趣挖掘：方法与应用：Social User Interest Mining: Methods and Applications

【KDD2019|讲座推荐】社会用户兴趣挖掘：方法与应用：Social User Interest Mining: Methods and Applications

专知会员服务

39+阅读 · 2019年12月11日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

41+阅读 · 2019年11月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

专知会员服务

31+阅读 · 2019年9月19日

【IJCAI 2019】细粒度的意见挖掘:当前趋势和前沿维度（Fine-grained Opinion Mining: Current Trend and Cutting-Edge Dimensions），虞剑飞

【IJCAI 2019】细粒度的意见挖掘:当前趋势和前沿维度（Fine-grained Opinion Mining: Current Trend and Cutting-Edge Dimensions），虞剑飞

专知会员服务

25+阅读 · 2019年8月11日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

人工智能 | 国际会议信息6条

人工智能 | 国际会议信息6条

Call4Papers

4+阅读 · 2019年1月4日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

15+阅读 · 2018年12月24日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

19+阅读 · 2020年4月10日

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Arxiv

15+阅读 · 2020年1月2日

The Measure of Intelligence

The Measure of Intelligence

Arxiv

6+阅读 · 2019年11月5日

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Arxiv

8+阅读 · 2018年8月29日

Leveraging Long and Short-term Information in Content-aware Movie Recommendation

Arxiv

8+阅读 · 2018年5月2日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

Multimodal Named Entity Recognition for Short Social Media Posts

Arxiv

8+阅读 · 2018年2月22日

TransRev: Modeling Reviews as Translations from Users to Items

Arxiv

9+阅读 · 2018年1月30日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

Natural Language Processing: State of The Art, Current Trends and Challenges

Arxiv

4+阅读 · 2017年8月17日

VIP会员

文章信息

相关主题

相关VIP内容

【KDD2019|讲座推荐】社会用户兴趣挖掘：方法与应用：Social User Interest Mining: Methods and Applications

【KDD2019|讲座推荐】社会用户兴趣挖掘：方法与应用：Social User Interest Mining: Methods and Applications

专知会员服务

39+阅读 · 2019年12月11日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

41+阅读 · 2019年11月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

【RecSys 2019报告】用于旅游业的推荐系统（Building Useful Recommender Systems for Tourists）

专知会员服务

31+阅读 · 2019年9月19日

【IJCAI 2019】细粒度的意见挖掘:当前趋势和前沿维度（Fine-grained Opinion Mining: Current Trend and Cutting-Edge Dimensions），虞剑飞

【IJCAI 2019】细粒度的意见挖掘:当前趋势和前沿维度（Fine-grained Opinion Mining: Current Trend and Cutting-Edge Dimensions），虞剑飞

专知会员服务

25+阅读 · 2019年8月11日

热门VIP内容

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

人工智能 | 国际会议信息6条

人工智能 | 国际会议信息6条

Call4Papers

4+阅读 · 2019年1月4日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

15+阅读 · 2018年12月24日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

相关论文

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

19+阅读 · 2020年4月10日

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

Arxiv

15+阅读 · 2020年1月2日

The Measure of Intelligence

The Measure of Intelligence

Arxiv

6+阅读 · 2019年11月5日

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Arxiv

8+阅读 · 2018年8月29日

Leveraging Long and Short-term Information in Content-aware Movie Recommendation

Arxiv

8+阅读 · 2018年5月2日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

Multimodal Named Entity Recognition for Short Social Media Posts

Arxiv

8+阅读 · 2018年2月22日

TransRev: Modeling Reviews as Translations from Users to Items

Arxiv

9+阅读 · 2018年1月30日

SentiPers: A Sentiment Analysis Corpus for Persian

Arxiv

5+阅读 · 2018年1月23日

Natural Language Processing: State of The Art, Current Trends and Challenges

Arxiv

4+阅读 · 2017年8月17日

微信扫码咨询专知VIP会员