已经学到什么和应该学到什么?关于如何有选择地增加分类案文的经验研究 (What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification) - 专知论文

会员服务 ·

0

Performer · 文本分类 · Extensibility · 学成 · INFORMS ·

2021 年 9 月 1 日

What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

翻译：已经学到什么和应该学到什么?关于如何有选择地增加分类案文的经验研究

Biyang Guo,Sonqiao Han,Hailiang Huang

Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Whilst lots of creative text augmentation methods have been designed, they augment the text in a non-selective manner, which means the less important or noisy words have the same chances to be augmented as the informative words, and thereby limits the performance of augmentation. In this work, we systematically summarize three kinds of role keywords, which have different functions for text classification, and design effective methods to extract them from the text. Based on these extracted role keywords, we propose STA (Selective Text Augmentation) to selectively augment the text, where the informative, class-indicating words are emphasized but the irrelevant or noisy words are diminished. Extensive experiments on four English and Chinese text classification benchmark datasets demonstrate that STA can substantially outperform the non-selective text augmentation methods.

翻译：文本扩增技术被广泛用于文字分类问题,以提高分类员的性能,特别是在低资源情景下。虽然设计了许多创造性文本扩增方法,但它们以非选择性的方式扩充了文本,这意味着不太重要或吵闹的词与内容丰富的词具有同样的增加机会,从而限制了扩增的性能。在这项工作中,我们系统地总结了三种作用关键词,这些关键词在文本分类方面有不同的功能,并设计了从文本中提取它们的有效方法。根据这些提取的关键字,我们建议STA(选择文本增强)有选择地增加文本,在强调信息性、分级说明性词但减少不相干或吵闹的字眼。关于四种英文和中文文本分类基准数据集的广泛实验表明STA可以大大优于非选择性文本扩增方法。

0

相关内容

Performer

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

108+阅读 · 2021年4月17日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

45+阅读 · 2020年7月29日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

90+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

专知会员服务

34+阅读 · 2020年5月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Guess what? You can boost Federated Learning for free

Arxiv

0+阅读 · 2021年10月21日

Contrastive Learning of Visual-Semantic Embeddings

Arxiv

1+阅读 · 2021年10月17日

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-Trained Language Models

Arxiv

0+阅读 · 2021年10月16日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

S$^\mathbf{4}$L: Self-Supervised Semi-Supervised Learning

Arxiv

5+阅读 · 2019年5月9日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Adversarial Feature Augmentation for Unsupervised Domain Adaptation

Arxiv

6+阅读 · 2018年5月4日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

相关VIP内容

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

108+阅读 · 2021年4月17日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

45+阅读 · 2020年7月29日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

90+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

最近几种小样本元学习简明综述，A Concise Review of Recent Few-shot Meta-learning Methods

专知会员服务

34+阅读 · 2020年5月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Guess what? You can boost Federated Learning for free

Arxiv

0+阅读 · 2021年10月21日

Contrastive Learning of Visual-Semantic Embeddings

Arxiv

1+阅读 · 2021年10月17日

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-Trained Language Models

Arxiv

0+阅读 · 2021年10月16日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

S$^\mathbf{4}$L: Self-Supervised Semi-Supervised Learning

Arxiv

5+阅读 · 2019年5月9日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Learning to Focus when Ranking Answers

Learning to Focus when Ranking Answers

Arxiv

5+阅读 · 2018年8月8日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Adversarial Feature Augmentation for Unsupervised Domain Adaptation

Arxiv

6+阅读 · 2018年5月4日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员