命名实体识别(NER)(也称为实体标识,实体组块和实体提取)是信息抽取的子任务,旨在将非结构化文本中提到的命名实体定位和分类为预定义类别,例如人员姓名、地名、机构名、专有名词等。

知识荟萃

命名实体识别 Named Entity Recognition 专知荟萃

综述

  1. Jing Li, Aixin Sun,Jianglei Han, Chenliang Li

  2. A Review of Named Entity Recognition (NER) Using Automatic Summarization of Resumes

模型算法

  1. LSTM + CRF中的NCRF++算法: Design Challenges and Misconceptions in Neural Sequence Labeling.COLLING 2018.

  2. CNN+CRF:

  3. BERT+(LSTM)+CRF:

入门学习

  1. NLP之CRF应用篇(序列标注任务)( CRF++的详细解析、Bi-LSTM+CRF中CRF层的详细解析、Bi-LSTM后加CRF的原因、CRF和Bi-LSTM+CRF优化目标的区别) )

  2. Bilstm+CRF中的CRF详解

  3. Bilstm-CRF中的CRF层解析-2

  4. Bilstm-CRF中的CRF层解析-3

  5. CRF和LSTM模型在序列标注上的优劣?

  6. CRF和LSTM的比较

  7. 入门参考:命名实体识别(NER)的二三事

  8. 基础却不简单,命名实体识别的难点与现状

  9. 通俗理解BiLSTM-CRF命名实体识别模型中的CRF层

重要报告

Tutorial

​1.(pyToech)高级:制定动态决策和BI-LSTM CRF(Advanced: Making Dynamic Decisions and the Bi-LSTM CRF) - [https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html]

代码

​1.中文命名实体识别(包括多种模型:HMM,CRF,BiLSTM,BiLSTM+CRF的具体实现)

  - [https://github.com/luopeixiang/named_entity_recognition]

领域专家

1.华为-诺亚方舟 - 李航 []

2.美国伊利诺伊大学 - 韩家炜 [https://hanj.cs.illinois.edu/]

命名实体识别工具

  1. Stanford NER
  2. MALLET
  3. Hanlp
  4. NLTK
  5. spaCy
  6. Ohio State University Twitter NER

###相关数据集

  1. CCKS2017 开放的中文的电子病例测评相关的数据。 评测任务一:

  2. CCKS2018 开放的音乐领域的实体识别任务。

评测任务:

  - [https://biendata.com/competition/CCKS2018_2/]
  1. NLPCC2018 开放的任务型对话系统中的口语理解评测。

CoNLL 2003

https://www.clips.uantwerpen.be/conll2003/ner/

进阶论文

1999

2005

2006

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

VIP内容

命名实体识别(Named Entity Recognition,NER)作为自然语言处理领域经典的研究主题,是智能问答、知识图谱等任务的基础技术。领域命名实体识别(Domain Named Entity Recognition,DNER)是面向特定领域的NER方案。在深度学习技术的推动下,中文DNER取得了突破性进展。概括了中文DNER的研究框架,从领域数据源的确定、领域实体类型及规范制定、领域数据集的标注规范、中文DNER评估指标四个角度对国内外已有研究成果进行了综合评述;总结了目前常见的中文DNER的技术框架,介绍了基于词典和规则的模式匹配方法、统计机器学习方法、基于深度学习的方法、多方融合的深度学习方法,并重点分析了基于词向量表征和深度学习的中文DNER方法;讨论了中文DNER的典型应用场景,对未来发展方向进行了展望。

成为VIP会员查看完整内容
0
24

最新论文

Sequence labeling is a fundamental task in natural language processing and has been widely studied. Recently, RNN-based sequence labeling models have increasingly gained attentions. Despite superior performance achieved by learning the long short-term (i.e., successive) dependencies, the way of sequentially processing inputs might limit the ability to capture the non-continuous relations over tokens within a sentence. To tackle the problem, we focus on how to effectively model successive and discrete dependencies of each token for enhancing the sequence labeling performance. Specifically, we propose an innovative attention-based model (called position-aware selfattention, i.e., PSA) as well as a well-designed self-attentional context fusion layer within a neural network architecture, to explore the positional information of an input sequence for capturing the latent relations among tokens. Extensive experiments on three classical tasks in sequence labeling domain, i.e., partof-speech (POS) tagging, named entity recognition (NER) and phrase chunking, demonstrate our proposed model outperforms the state-of-the-arts without any external knowledge, in terms of various metrics.

0
0
下载
预览
Top