命名实体识别(NER)(也称为实体标识,实体组块和实体提取)是信息抽取的子任务,旨在将非结构化文本中提到的命名实体定位和分类为预定义类别,例如人员姓名、地名、机构名、专有名词等。

知识荟萃

命名实体识别 Named Entity Recognition 专知荟萃

综述

  1. Jing Li, Aixin Sun,Jianglei Han, Chenliang Li

  2. A Review of Named Entity Recognition (NER) Using Automatic Summarization of Resumes

模型算法

  1. LSTM + CRF中的NCRF++算法: Design Challenges and Misconceptions in Neural Sequence Labeling.COLLING 2018.

  2. CNN+CRF:

  3. BERT+(LSTM)+CRF:

入门学习

  1. NLP之CRF应用篇(序列标注任务)( CRF++的详细解析、Bi-LSTM+CRF中CRF层的详细解析、Bi-LSTM后加CRF的原因、CRF和Bi-LSTM+CRF优化目标的区别) )

  2. Bilstm+CRF中的CRF详解

  3. Bilstm-CRF中的CRF层解析-2

  4. Bilstm-CRF中的CRF层解析-3

  5. CRF和LSTM模型在序列标注上的优劣?

  6. CRF和LSTM的比较

  7. 入门参考:命名实体识别(NER)的二三事

  8. 基础却不简单,命名实体识别的难点与现状

  9. 通俗理解BiLSTM-CRF命名实体识别模型中的CRF层

重要报告

Tutorial

​1.(pyToech)高级:制定动态决策和BI-LSTM CRF(Advanced: Making Dynamic Decisions and the Bi-LSTM CRF) - [https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html]

代码

​1.中文命名实体识别(包括多种模型:HMM,CRF,BiLSTM,BiLSTM+CRF的具体实现)

  - [https://github.com/luopeixiang/named_entity_recognition]

领域专家

1.华为-诺亚方舟 - 李航 []

2.美国伊利诺伊大学 - 韩家炜 [https://hanj.cs.illinois.edu/]

命名实体识别工具

  1. Stanford NER
  2. MALLET
  3. Hanlp
  4. NLTK
  5. spaCy
  6. Ohio State University Twitter NER

###相关数据集

  1. CCKS2017 开放的中文的电子病例测评相关的数据。 评测任务一:

  2. CCKS2018 开放的音乐领域的实体识别任务。

评测任务:

  - [https://biendata.com/competition/CCKS2018_2/]
  1. NLPCC2018 开放的任务型对话系统中的口语理解评测。

CoNLL 2003

https://www.clips.uantwerpen.be/conll2003/ner/

进阶论文

1999

2005

2006

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

VIP内容

摘要: 在自然语言处理领域,信息抽取一直以来受到人们的关注.信息抽取主要包括3项子任务:实体抽取、关系抽取和事件抽取,而关系抽取是信息抽取领域的核心任务和重要环节.实体关系抽取的主要目标是从自然语言文本中识别并判定实体对之间存在的特定关系,这为智能检索、语义分析等提供了基础支持,有助于提高搜索效率,促进知识库的自动构建.综合阐述了实体关系抽取的发展历史,介绍了常用的中文和英文关系抽取工具和评价体系.主要从4个方面展开介绍了实体关系抽取方法,包括:早期的传统关系抽取方法、基于传统机器学习、基于深度学习和基于开放领域的关系抽取方法,总结了在不同历史阶段的主流研究方法以及相应的代表性成果,并对各种实体关系抽取技术进行对比分析.最后,对实体关系抽取的未来重点研究内容和发展趋势进行了总结和展望.

http://crad.ict.ac.cn/CN/10.7544/issn1000-1239.2020.20190358#1

成为VIP会员查看完整内容
0
23

最新内容

Motivation: Recognizing named entities (NER) and their associated attributes like negation are core tasks in natural language processing. However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications. Weakly supervised learning, which automatically builds imperfect training sets from low cost, less accurate labeling rules, offers a potential solution. Medical ontologies are compelling sources for generating labels, however combining multiple ontologies without ground truth data creates challenges due to label noise introduced by conflicting entity definitions. Key questions remain on the extent to which weakly supervised entity classification can be automated using ontologies, or how much additional task-specific rule engineering is required for state-of-the-art performance. Also unclear is how pre-trained language models, such as BioBERT, improve the ability to generalize from imperfectly labeled data. Results: We present Trove, a framework for weakly supervised entity classification using medical ontologies. We report state-of-the-art, weakly supervised performance on two NER benchmark datasets and establish new baselines for two entity classification tasks in clinical text. We perform within an average of 3.5 F1 points (4.2%) of NER classifiers trained with hand-labeled data. Automatically learning label source accuracies to correct for label noise provided an average improvement of 3.9 F1 points. BioBERT provided an average improvement of 0.9 F1 points. We measure the impact of combining large numbers of ontologies and present a case study on rapidly building classifiers for COVID-19 clinical tasks. Our framework demonstrates how a wide range of medical entity classifiers can be quickly constructed using weak supervision and without requiring manually-labeled training data.

0
0
下载
预览

最新论文

Motivation: Recognizing named entities (NER) and their associated attributes like negation are core tasks in natural language processing. However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications. Weakly supervised learning, which automatically builds imperfect training sets from low cost, less accurate labeling rules, offers a potential solution. Medical ontologies are compelling sources for generating labels, however combining multiple ontologies without ground truth data creates challenges due to label noise introduced by conflicting entity definitions. Key questions remain on the extent to which weakly supervised entity classification can be automated using ontologies, or how much additional task-specific rule engineering is required for state-of-the-art performance. Also unclear is how pre-trained language models, such as BioBERT, improve the ability to generalize from imperfectly labeled data. Results: We present Trove, a framework for weakly supervised entity classification using medical ontologies. We report state-of-the-art, weakly supervised performance on two NER benchmark datasets and establish new baselines for two entity classification tasks in clinical text. We perform within an average of 3.5 F1 points (4.2%) of NER classifiers trained with hand-labeled data. Automatically learning label source accuracies to correct for label noise provided an average improvement of 3.9 F1 points. BioBERT provided an average improvement of 0.9 F1 points. We measure the impact of combining large numbers of ontologies and present a case study on rapidly building classifiers for COVID-19 clinical tasks. Our framework demonstrates how a wide range of medical entity classifiers can be quickly constructed using weak supervision and without requiring manually-labeled training data.

0
0
下载
预览
Top