BERT基于无监督语法错误纠正框架 (A BERT-based Unsupervised Grammatical Error Correction Framework) - 专知论文

会员服务 ·

0

语料库 · 语料 · 无监督 · 困惑度 · BERT ·

2023 年 3 月 30 日

A BERT-based Unsupervised Grammatical Error Correction Framework

翻译：BERT基于无监督语法错误纠正框架

Nankai Lin,Hongbin Zhang,Menglan Shen,Yu Wang,Shengyi Jiang,Aimin Yang

Grammatical error correction (GEC) is a challenging task of natural language processing techniques. While more attempts are being made in this approach for universal languages like English or Chinese, relatively little work has been done for low-resource languages for the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring performs well. However, the pre-trained language model is still to be explored in this context. This study proposes a BERT-based unsupervised GEC framework, where GEC is viewed as multi-class classification task. The framework contains three modules: data flow construction module, sentence perplexity scoring module, and error detecting and correcting module. We propose a novel scoring method for pseudo-perplexity to evaluate a sentence's probable correctness and construct a Tagalog corpus for Tagalog GEC research. It obtains competitive performance on the Tagalog corpus we construct and open-source Indonesian corpus and it demonstrates that our framework is complementary to baseline method for low-resource GEC task.

翻译：语法错误纠正（GEC）是自然语言处理技术中的一个具有挑战性的任务。虽然针对普通的语言如英语或汉语等进行了更多的尝试，但是在低资源语言中（缺乏大型的注释语料库），目前的基于语言模型评分的无监督GEC表现良好。然而，预训练语言模型在这一领域中仍需要进行探索。该研究提出了一个基于BERT的无监督GEC框架，其中GEC被视为多类别分类任务。该框架包含三个模块：数据流构建模块、句子困惑度评分模块和错误检测和纠正模块。我们提出了一种新的伪困惑度评分方法来评估句子的可能正确性，并为菲律宾语构建了一个语料库，用于菲律宾语GEC研究。它在我们构建的菲律宾语语料库和开源的印尼语语料库中获得了竞争性的性能，并且证明了我们的框架对于低资源GEC任务是有补充价值的。

0

相关内容

语料库

语料库是语料库语言学研究的基础资源，也是经验主义语言研究方法的主要资源。应用于词典编纂，语言教学，传统语言研究，自然语言处理中基于统计或实例的研究等方面。

【EMNLP2021】span-based反情感偏见的方面词表示学习框架

专知会员服务

15+阅读 · 2021年9月25日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

45+阅读 · 2020年7月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

22+阅读 · 2020年4月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

专知会员服务

40+阅读 · 2020年2月26日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

31+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

83+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

针刺、语言任务干预卒中后运动性失语的fMRI/ERP双模态脑网络效应机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有不完全基数语义的语言偏好多准则分析技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

肌萎缩脊髓侧索硬化症关键致病基因的细胞模型及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

罗汉果甜苷V生物合成关键酶-葡萄糖基转移酶的功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

汉语句法分析中的自动歧义识别和分类问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

大数据环境下基于领域知识获取与对齐的观点检索研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于深层学习的汉语句法语义分析研究

国家自然科学基金

3+阅读 · 2012年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于配价结构和话题结构的汉语句法分析和语义计算模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

Arxiv

0+阅读 · 2023年5月19日

A Unified Prompt-Guided In-Context Inpainting Framework for Reference-based Image Manipulations

Arxiv

0+阅读 · 2023年5月19日

Zero-Shot Text Classification via Self-Supervised Tuning

Arxiv

0+阅读 · 2023年5月19日

Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping

Arxiv

0+阅读 · 2023年5月18日

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

Arxiv

0+阅读 · 2023年5月18日

Using a Large Language Model to Control Speaking Style for Expressive TTS

Arxiv

0+阅读 · 2023年5月17日

Error-Correcting Codes for Nanopore Sequencing

Arxiv

0+阅读 · 2023年5月17日

Classification of US Supreme Court Cases using BERT-Based Techniques

Arxiv

0+阅读 · 2023年5月16日

Application-Agnostic Language Modeling for On-Device ASR

Arxiv

0+阅读 · 2023年5月16日

A Survey on Causal Inference

Arxiv

109+阅读 · 2020年2月5日

VIP会员

文章信息

相关主题

相关VIP内容

【EMNLP2021】span-based反情感偏见的方面词表示学习框架

专知会员服务

15+阅读 · 2021年9月25日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

45+阅读 · 2020年7月29日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

22+阅读 · 2020年4月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

专知会员服务

40+阅读 · 2020年2月26日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

31+阅读 · 2020年2月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

83+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

54+阅读 · 2019年10月17日

热门VIP内容

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

Arxiv

0+阅读 · 2023年5月19日

A Unified Prompt-Guided In-Context Inpainting Framework for Reference-based Image Manipulations

Arxiv

0+阅读 · 2023年5月19日

Zero-Shot Text Classification via Self-Supervised Tuning

Arxiv

0+阅读 · 2023年5月19日

Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping

Arxiv

0+阅读 · 2023年5月18日

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

Arxiv

0+阅读 · 2023年5月18日

Using a Large Language Model to Control Speaking Style for Expressive TTS

Arxiv

0+阅读 · 2023年5月17日

Error-Correcting Codes for Nanopore Sequencing

Arxiv

0+阅读 · 2023年5月17日

Classification of US Supreme Court Cases using BERT-Based Techniques

Arxiv

0+阅读 · 2023年5月16日

Application-Agnostic Language Modeling for On-Device ASR

Arxiv

0+阅读 · 2023年5月16日

A Survey on Causal Inference

Arxiv

109+阅读 · 2020年2月5日

相关基金

针刺、语言任务干预卒中后运动性失语的fMRI/ERP双模态脑网络效应机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有不完全基数语义的语言偏好多准则分析技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

肌萎缩脊髓侧索硬化症关键致病基因的细胞模型及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

罗汉果甜苷V生物合成关键酶-葡萄糖基转移酶的功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

汉语句法分析中的自动歧义识别和分类问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

大数据环境下基于领域知识获取与对齐的观点检索研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于深层学习的汉语句法语义分析研究

国家自然科学基金

3+阅读 · 2012年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于配价结构和话题结构的汉语句法分析和语义计算模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员