ORCID-ORCID 用于评价作者名称的标签数据 (ORCID-linked labeled data for evaluating author name disambiguation at scale) - 专知论文

会员服务 ·

0

Performer · 标注 · 示例 · 缩放 · Continuity ·

2021 年 2 月 5 日

ORCID-linked labeled data for evaluating author name disambiguation at scale

翻译：ORCID-ORCID 用于评价作者名称的标签数据

Jinseok Kim,Jason Owen-Smith

from arxiv, A pre-print of a paper accepted for publication in the journal Scientometrics

How can we evaluate the performance of a disambiguation method implemented on big bibliographic data? This study suggests that the open researcher profile system, ORCID, can be used as an authority source to label name instances at scale. This study demonstrates the potential by evaluating the disambiguation performances of Author-ity2009 (which algorithmically disambiguates author names in MEDLINE) using 3 million name instances that are automatically labeled through linkage to 5 million ORCID researcher profiles. Results show that although ORCID-linked labeled data do not effectively represent the population of name instances in Author-ity2009, they do effectively capture the 'high precision over high recall' performances of Author-ity2009. In addition, ORCID-linked labeled data can provide nuanced details about the Author-ity2009's performance when name instances are evaluated within and across ethnicity categories. As ORCID continues to be expanded to include more researchers, labeled data via ORCID-linkage can be improved in representing the population of a whole disambiguated data and updated on a regular basis. This can benefit author name disambiguation researchers and practitioners who need large-scale labeled data but lack resources for manual labeling or access to other authority sources for linkage-based labeling. The ORCID-linked labeled data for Author-tiy2009 are publicly available for validation and reuse.

翻译：我们如何评价大书目数据中采用的模糊化方法的性能?本研究表明,开放的研究人员剖面系统ORCID可以作为一个权威来源,用于在规模上标出名称实例。本研究通过评估2009年作者身份的模糊化表现(在逻辑上模糊化作者名称),利用300万个通过链接链接到500万ORCID研究者概况而自动标出的名称实例来评估?结果显示,尽管ORCID链接的标签数据无法有效代表2009年作者身份中的名称实例,但它们确实能够有效捕捉“高精确度超过2009年作者身份的高回顾性能”的权威性来源。此外,ORCID链接的标签数据可以提供2009年作者身份的模糊性表现细节(在对族裔内部和跨族裔类别中标出名称实例时,这种模糊性表现在逻辑上模糊化了作者身份,因为ORCID链接的标签数据可以在代表整个基于模糊化的数据和定期更新的人群方面加以改进。这有利于作者命名不清晰化的研究人员和业者,他们需要通过大规模标签标签的标签链接链接的其他数据来源。

0

相关内容

Performer

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

131+阅读 · 2019年11月25日

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

专知会员服务

7+阅读 · 2019年11月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

A Novel Approach to Detect Redundant Activity Labels For More Representative Event Logs

Arxiv

0+阅读 · 2021年3月30日

Collecting large-scale publication data at the level of individual researchers: A practical proposal for author name disambiguation

Arxiv

0+阅读 · 2021年3月26日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Arxiv

3+阅读 · 2019年9月26日

Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation

Arxiv

3+阅读 · 2019年6月24日

Generating Fine-Grained Open Vocabulary Entity Type Descriptions

Arxiv

4+阅读 · 2018年5月27日

Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations

Arxiv

4+阅读 · 2018年5月18日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Knowledge-based Word Sense Disambiguation using Topic Models

Arxiv

5+阅读 · 2018年1月5日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

VIP会员

文章信息

相关主题

相关VIP内容

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

131+阅读 · 2019年11月25日

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

【ACL 2019 Tutorials】把维基百科作为文本分析和检索的资源（Wikipedia as a Resource for Text Analysis and Retrieval），Marius Pasca

专知会员服务

7+阅读 · 2019年11月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

全球AI工具市场发展现状与趋势分析2025

自动驾驶地图：全流程综述与前沿进展

协同智能体：多智能体人工智能系统如何变革军事训练及其他领域

【NeurIPS2025】TITAN：一种面向轨迹感知的大规模 VQE 自适应参数冻结技术

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

A Novel Approach to Detect Redundant Activity Labels For More Representative Event Logs

Arxiv

0+阅读 · 2021年3月30日

Collecting large-scale publication data at the level of individual researchers: A practical proposal for author name disambiguation

Arxiv

0+阅读 · 2021年3月26日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Arxiv

3+阅读 · 2019年9月26日

Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation

Arxiv

3+阅读 · 2019年6月24日

Generating Fine-Grained Open Vocabulary Entity Type Descriptions

Arxiv

4+阅读 · 2018年5月27日

Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations

Arxiv

4+阅读 · 2018年5月18日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Knowledge-based Word Sense Disambiguation using Topic Models

Arxiv

5+阅读 · 2018年1月5日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

微信扫码咨询专知VIP会员