CODA-19:利用非专家人群对COVID-19开放研究数据集10 000+摘要的研究问题进行说明 (CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset) - 专知论文

会员服务 ·

0

COVID-19 · 模型评估 · 绝对多数投票 · 数据集 · 标注 ·

2020 年 9 月 17 日

CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset

翻译：CODA-19:利用非专家人群对COVID-19开放研究数据集10 000+摘要的研究问题进行说明

Ting-Hao 'Kenneth' Huang,Chieh-Yang Huang,Chien-Kuang Cornelia Ding,Yen-Chia Hsu,C. Lee Giles

from arxiv, Accepted by the NLP COVID-19 Workshop at ACL 2020. (The data, code, and model are available at: https://github.com/windx0303/CODA-19)

This paper introduces CODA-19, a human-annotated dataset that codes the Background, Purpose, Method, Finding/Contribution, and Other sections of 10,966 English abstracts in the COVID-19 Open Research Dataset. CODA-19 was created by 248 crowd workers from Amazon Mechanical Turk within 10 days, and achieved labeling quality comparable to that of experts. Each abstract was annotated by nine different workers, and the final labels were acquired by majority vote. The inter-annotator agreement (Cohen's kappa) between the crowd and the biomedical expert (0.741) is comparable to inter-expert agreement (0.788). CODA-19's labels have an accuracy of 82.2% when compared to the biomedical expert's labels, while the accuracy between experts was 85.0%. Reliable human annotations help scientists access and integrate the rapidly accelerating coronavirus literature, and also serve as the battery of AI/NLP research, but obtaining expert annotations can be slow. We demonstrated that a non-expert crowd can be rapidly employed at scale to join the fight against COVID-19.

翻译：本文介绍CODA-19,这是一套人类附加说明的数据集,在COVID-19开放研究数据集中将背景、目的、方法、查找/贡献和其他部分的10,966份英文摘要编码为COVID-19开放研究数据集,由来自亚马逊机械土耳其岛的248名群众工人在10天内创建的CODA-19,达到了与专家质量相当的标签;每个摘要都有9名不同工人的注解,最后的标签以多数票获得;人群与生物医学专家之间的顾问间协议(Cohen's kappa)与专家间协议(0.788)相似;CODA-19的标签与生物医学专家标签相比,准确率为82.2%;专家之间的准确度为85.0%;可靠的人类说明有助于科学家获得并综合迅速加速的 Corona病毒文献,同时也作为AI/NLP研究的电池,但获得专家说明的过程可能缓慢。

1

相关内容

COVID-19

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

105+阅读 · 2020年6月10日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

46+阅读 · 2020年5月17日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

90+阅读 · 2020年5月5日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

56+阅读 · 2020年3月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images

Arxiv

6+阅读 · 2020年3月22日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

Learning Rich Features for Image Manipulation Detection

Arxiv

9+阅读 · 2018年5月13日

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

Arxiv

4+阅读 · 2018年4月30日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

Tracking Noisy Targets: A Review of Recent Object Tracking Approaches

Arxiv

9+阅读 · 2018年2月14日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

VIP会员

文章信息

相关主题

绝对多数投票

相关VIP内容

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

105+阅读 · 2020年6月10日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

46+阅读 · 2020年5月17日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

90+阅读 · 2020年5月5日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

56+阅读 · 2020年3月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

美国化学会 (ACS) 北京代表处招聘

美国化学会 (ACS) 北京代表处招聘

知社学术圈

11+阅读 · 2018年9月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

COVID-Net: A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest Radiography Images

Arxiv

6+阅读 · 2020年3月22日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

Learning Rich Features for Image Manipulation Detection

Arxiv

9+阅读 · 2018年5月13日

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

Arxiv

4+阅读 · 2018年4月30日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

Tracking Noisy Targets: A Review of Recent Object Tracking Approaches

Arxiv

9+阅读 · 2018年2月14日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

微信扫码咨询专知VIP会员