在搭建网络模型时,需要随机初始化参数,然后开始训练网络,不断调整直到网络的损失越来越小。在训练的过程中,一开始初始化的参数会不断变化。当参数训练到比较好的时候就可以将训练模型的参数保存下来,以便训练好的模型可以在下次执行类似任务时获得较好的结果。

VIP内容

近年来,在大量原始文本上预先训练的大型语言模型彻底改变了自然语言处理。现有的方法,基于因果或隐藏的语言模型的变化,现在为每一个NLP任务提供了事实上的方法。在这个演讲中,我将讨论最近在语言模型预训练方面的工作,从ELMo、GPT和BERT到更近期的模型。我的目标是对总体趋势进行广泛的报道,但提供更多关于我们最近在Facebook AI和华盛顿大学开发的模型的细节。其中特别包括序列到序列模型的预训练方法,如BART、mBART和MARGE,它们提供了一些迄今为止最普遍适用的方法。

成为VIP会员查看完整内容
0
38

最新内容

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.

0
0
下载
预览

最新论文

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.

0
0
下载
预览
Top