【CMU 2017秋季】深度学习NLP课程，PPT+视频+课程表

2017 年 9 月 4 日 新智元

1新智元编译

来源：phontron.com

编译：Neko

【新智元导读】卡内基梅隆大学（CMU）公开了秋季NLP神经网络课程（Neural Networks for NLP）的全部课程大纲和阅读材料，以及前两周的PPT、示例代码、课程视频等材料，随着课程进行将公开后续课程材料，适合跟进度学习，本文带来各课内容简介。

教师/助教：

教师：Graham Neubig (gneubig@cs.cmu.edu)
助教：Zhengzhong (Hector) Liu

Xuezhe (Max) Ma

Daniel Clothiaux

课程描述

神经网络为语言建模提供了强大的新工具，并且已经被用于在许多任务中改进最优的结果，挑战新的以前很困难的新问题。本课程将从神经网络的简要概述开始，然后话费大部分课时来讲解如何将神经网络应用于自然语言处理问题。本课程每个部分将介绍自然语言的一个特定问题或语言现象，描述为什么这些问题或现象很难建模，并展示几个解决这些问题的模型。课程将涵盖在创建神经网络模型中涉及到的不同技术，包括处理易变大小和结构化句子，高效处理大数据，半监督和无监督学习，结构化预测和多语言建模。

课程计划

8/29 课程介绍

内容：

神经网络简介
示例任务及其难点
哪些神经网络对这些任务有帮助？

阅读材料：

高度推荐：Neural Network Methods for Natural Language Processing

Synthesis Lectures on Human Language Technologies (Chapters 1-5)

By Yoav Goldberg

该书涵盖了神经网络中的许多概念，可能是一部分同学已经熟悉的。如果你已经对神经网络很熟悉，可以忽略这本书；如果没有，请仔细阅读该书前5章。

参考阅读：Deep Unordered Composition. (Iyyer et al.)

PPT: Class Intro Slides （http://phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-01-intro.pdf）
示例代码： Class Intro Code Examples （https://github.com/neubig/nn4nlp2017-code/tree/master/01-intro）

课程视频: Class Intro Lecture Video（https://youtu.be/Sss2EA4hhBQ）

8/31 实践：在句子中预测下一个词

内容：

计算图
前馈神经网络语言模型
测量模型性能：似然性和困惑度

阅读材料：

高度推荐：上述Goldberg书第8-9章

参考资料：Goldberg Book 6-7章

Maximum entropy (log-linear) language models. (Rosenfeld 1996)

A Neural Probabilistic Language Model. (Bengio et al. 2003, JMLR)

An Overview of Gradient Descent Algorithms. (Ruder 2016)

The Marginal Value of Adaptive Gradient Methods. (Wilson et al. 2017)

Stronger Baselines for Neural MT. (Denkowski Neubig 2017)

Reference: Using the Output Embedding. (Press and Wolf 2016)

PPT: LM Slides（phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-02-lm.pdf）

示例代码: LM Code Examples （https://github.com/neubig/nn4nlp2017-code/tree/master/02-lm）

课程视频: LM Lecture Video（https://youtu.be/tNC9tpGqQb0）

Section 1: 词模型

9/5 分布语义和词向量

内容：

根据该词周围的词描述一个词
计数和预测
Skip-gram和CBOW
评估/可视化词向量

阅读材料：

Goldberg Book 10-11章

参考资料：

WordNet（https://wordnet.princeton.edu）

Linguistic Regularities in Continuous Representations (Mikolov et al. 2013)

t-SNE (van der Maaten and Hinton 2008)

Visualizing w/ PCA vs. t-SNE (Derksen 2016)

How to use t-SNE effectively (Wattenberg et al. 2016)

Morphology-based Embeddings (Luong et al. 2013)

Character-based Embeddings (Ling et al. 2015)

Subword-based Embeddings (Bojankowski et al. 2017)

Multi-prototype Embeddings (Reisinger and Mooney 2010)

Cross-lingual Embeddings (Faruqui et al. 2014)

Retrofitting to Lexicons (Faruqui et al. 2015)

De-biasing Word Embeddings (Bolukbasi et al. 2016)

9/7 为什么 word2vec 很快? 神经网络的一些加速技巧

内容（客座讲师：Taylor Berg-Kirkpatrick）：

Softmax近似：负采样，分层Softmax
平行训练
GPU上训练的提示

阅读材料：

Notes on Noise Contrastive Estimation and Negative Sampling (Dyer 2014)

参考资料：

Importance Sampling (Bengio and Senécal, 2003)

Noise Contrastive Estimation (Mnih and Teh, 2012)

Negative Sampling (Goldberg and Levy, 2014)

Mini-batching Sampling-based Softmax Approximations (Zoph et al., 2015)

Class-based Softmax (Goodman 2001)

Hierarchical Softmax (Morin and Bengio 2005)

Error Correcting Codes (Dietterich and Bakiri 1995)

Binary Code Prediction for Language (Oda et al. 2017)

Section 2: 句子模型

9/12 词袋模型（Bag of Words）, Bag of n-grams, 卷积网络

内容：

示例句子建模任务：分类
词袋模型
连续词袋模型（Continuous Bag of Words，CBOW）
连续Bag of n-grams：卷积网络
简单模型的惊人力量

阅读材料：

Goldberg Book 13章

参考资料：

Convolutional Networks for Sentence Classification, Dilated Convolutions (Yu and Koltun, ICLR2016)

9/14 句子或语言建模的循环神经网络

内容：

循环网络
梯度消失和LSTMs
句子建模中循环网络的优缺点

阅读材料：

Goldberg Book 14-16章

9/19 句子建模的应用

内容：

句子相似性
检索

Section 3: 序列到序列模型

9/21 条件生成

内容：

编码器-解码器（Encoder-Decoder）模型
条件生成与搜索

阅读材料：

《神经机器翻译和序列到序列模型》（Neural Machine Translation and Sequence-to-Sequence Models）第7章

By Graham Neubig

https://arxiv.org/pdf/1703.01619.pdf

9/26 注意力机制

内容：

编码器-解码器（Encoder-Decoder）模型
条件生成与搜索

阅读材料

《神经机器翻译和序列到序列模型》第8章

Neural Machine Translation by Jointly Learning to Align and Translation (Bahdanau et al. ICLR 2015)

Attention is All You Need (Vaswani et al., arXiv 2017)

Section 4: 结构化预测模型

9/28 基于搜索的机构化预测

内容：

结构化感知器
结构化Max-margin Objectives

阅读材料：

Course in Machine Learning 第17章（ciml.info/dl/v0_99/ciml-v0_99-ch17.pdf）Beam Search Optimization (Wiseman and Rush, EMNLP2016)

10/3 结构化预测与局部独立性假设（Local Independence Assumptions）

内容：

Viterbi 算法
Margin-infused Viterbi
条件随机场

阅读材料：

Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al. 2015)
End-to-end Sequence Labeling with Bi-directional LSTM-CNNs-CRF (Ma et al. 2016)

Section 5: 句法/语义分析模型

10/5 Shift-reduce解析模型

10/10 最小生成树解析模型

10/12 图结构模型

Section 6: 高级学习技术

10/17 变分自编码器

10/19 对抗网络

10/24 边际似然性, 强化学习

10/26 结构的半监督和无监督学习

Section 7: 文档和话语模型

10/31 文档模型（Document Models）

11/2 语篇/对话模型

Section 8: 神经网络和知识

11/7 从/为关系数据库学习