以抽样为基础的神经语言建模培训标准 (On Sampling-Based Training Criteria for Neural Language Modeling) - 专知论文

会员服务 ·

0

采样法 · 语言模型化 · 神经语言模型 · MoDELS · 词表 ·

2021 年 6 月 17 日

On Sampling-Based Training Criteria for Neural Language Modeling

翻译：以抽样为基础的神经语言建模培训标准

Yingbo Gao,David Thulke,Alexander Gerstenberger,Khoa Viet Tran,Ralf Schlüter,Hermann Ney

from arxiv, Accepted at INTERSPEECH 2021

As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated. The essence of these sampling methods is that the softmax-related traversal over the entire vocabulary can be simplified, giving speedups compared to the baseline. A problem we notice about the current landscape of such sampling methods is the lack of a systematic comparison and some myths about preferring one over another. In this work, we consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation. Linking back to the three traditional criteria, namely mean squared error, binary cross-entropy, and cross-entropy, we derive the theoretical solutions to the training problems. Contrary to some common belief, we show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities. Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim, with all sampling-based methods showing similar perplexities and word error rates while giving the expected speedups.

翻译：随着现代以字为基础的语言模型的词汇规模越来越大,许多基于抽样的培训标准都得到提出和调查。这些抽样方法的实质是,可以简化整个词汇中与软式马克斯相关的跨度,使速度与基线相比较。我们注意到,关于这种抽样方法当前景观的一个问题在于缺乏系统比较,以及某些关于相互偏爱的神话。在这项工作中,我们认为蒙特卡洛取样、重要取样、一种我们称之为部分加和的新颖方法,以及噪音对比估计。与三种传统标准,即平均正方形错误、二元交叉杂交和交叉成份联系起来,我们从理论上找到解决培训问题的解决方案。与一些共同的看法相反,我们发现所有这些抽样方法都能同样地发挥作用,只要我们纠正了预期的阶级后表概率。在交换台和利布里施佩奇的语言建模和自动语音识别方面的实验结果支持我们的要求,所有基于取样的方法都显示出类似的折叠和字词错率,同时给出了预期的加速率。

0

相关内容

采样法

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Modeling Performance and Energy trade-offs in Online Data-Intensive Applications

Modeling Performance and Energy trade-offs in Online Data-Intensive Applications

Arxiv

0+阅读 · 2021年8月18日

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

Arxiv

0+阅读 · 2021年8月16日

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

Arxiv

0+阅读 · 2021年8月16日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Arxiv

15+阅读 · 2018年5月24日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

VIP会员

文章信息

相关主题

语言模型化

神经语言模型

相关VIP内容

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【新书】R语言统计学习，R for Statistical Learning，301页pdf

专知会员服务

30+阅读 · 2020年11月4日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

GPT3 论文详解 | GPT-3: Language Models are Few-Shot Learners

AINLP

8+阅读 · 2020年6月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Modeling Performance and Energy trade-offs in Online Data-Intensive Applications

Modeling Performance and Energy trade-offs in Online Data-Intensive Applications

Arxiv

0+阅读 · 2021年8月18日

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

Arxiv

0+阅读 · 2021年8月16日

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

Arxiv

0+阅读 · 2021年8月16日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Arxiv

15+阅读 · 2018年5月24日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

微信扫码咨询专知VIP会员