斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

2018 年 7 月 15 日 专知

【导读】斯坦福大学2018秋季学期推出《机器学习硬件加速器》课程，介绍机器学习系统中的硬件加速器训练和推理的架构技术，系统而又前沿，是该领域不可多得的课程值得一看。

课程简介

本课程将深入介绍在机器学习系统中用于设计训练和推理加速器的架构技术。本课程将涵盖经典的ML算法，如线性回归和支持向量机，以及DNN模型，如卷积神经网络和递归神经网络。我们将考虑对这些模型的训练和推断，并讨论批量大小、精度、稀疏性和压缩等参数对这些模型精度的影响。我们将介绍ML模型推理和训练的加速器设计。学生将熟悉使用并行性、局部性和低精度来实现ML中使用的核心计算内核的硬件实现技术。为了设计高效节能的加速器，学生们将建立直觉，在ML模型参数和硬件实现技术之间进行权衡。学生将阅读最近的研究论文并完成一个设计项目。

课程地址：

https://cs217.github.io/

教师介绍

Kunle Olukotun 教授：

http://arsenalfc.stanford.edu/kunle

ARDAVAN PEDRAM

https://web.stanford.edu/~perdavan/

课程内容安排

Lecture	Topic	Reading	Spatial Assignment

1	Introduction, role of hardware accelerators in post Dennard and Moore era （硬件加速器在后登纳-摩尔时代作用介绍）	Is Dark silicon useful? Hennessy Patterson Chapter 7.1-7.2
2	Classical ML algorithms: Regression, SVMs (What is the building block?) （经典ML算法：回归，SVMs）	TABLA
3	Linear algebra fundamentals and accelerating linear algebra BLAS operations 20th century techniques: Systolic arrays and MIMDs, CGRAs （线性代数基础和BLAS加速运算）	Why Systolic Architectures? Anatomy of high performance GEMM	Linear Algebra Accelerators
4	Evaluating Performance, Energy efficiency, Parallelism, Locality, Memory hierarchy, Roofline model （评价性能、能效、并行度、局部性、内存层次结构,Roofline 模型）	Dark Memory
5	Real-World Architectures: Putting it into practice Accelerating GEMM: Custom, GPU, TPU1 architectures and their GEMM performance （现实世界的架构:将其付诸实践加速GEMM:自定义、GPU、TPU1架构和它们的GEMM性能。）	Google TPU Codesign Tradeoffs NVIDIA Tesla V100
6	Neural networks: MLPs and CNNs Inference （神经网络：MLP和CNN推断）	Viviense IEEE proceeding Brooks’s book (Selected Chapters)	CNN Inference Accelerators
7	Accelerating Inference for CNNs: Blocking and Parallelism in practice DianNao, Eyeriss, TPU1 （加速对CNNs的推理:在实践中阻塞和并行。 DianNao、Eyeriss TPU1）	Systematic Approach to Blocking Eyeriss Google TPU (see lecture 5)
8	Modeling neural networks with Spatial, Analyzing performance and energy with Spatial （以空间为基础的神经网络建模，分析性能和空间能量）	Spatial One related work
9	Training: SGD, back propagation, statistical efficiency, batch size （训练：SGD，）反向传播，	NIPS workshop last year Graphcore	Training Accelerators
10	Resilience of DNNs: Sparsity and Low Precision Networks （DNNs的弹性能力:稀疏性和低精度网络）	Some theory paper EIE Flexpoint of Nervana Boris Ginsburg: paper, presentation LSTM Block Compression by Baidu?
11	Low precision training （低精度训练）	HALP Ternary or binary networks See Boris Ginsburg's work (lecture 10)
12	Training in Distributed and Parallel systems: Hogwild!, asynchrony and hardware efficiency （分布式并行系统训练）	Deep Gradient compression Hogwild! Large Scale Distributed Deep Networks Obstinate cache?
13	FPGAs and CGRAs: Catapult, Brainwave, Plasticine （FPGA）	Catapult Brainwave Plasticine
14	ML benchmarks: DAWNbench, MLPerf (机器学习基准)	DawnBench Some other benchmark paper
15	Project presentations

客座讲师

课程相关内容Slides

Lecture01: Deep Learning Challenge. Is There Theory? (Donoho/Monajemi/Papyan)
https://cs217.github.io/assets/lectures/StanfordStats385-20170927-Lecture01-Donoho.pdf
Lecture02: Overview of Deep Learning From a Practical Point of View (Donoho/Monajemi/Papyan)
https://cs217.github.io/assets/lectures/Lecture-02-AsCorrected.pdf
Lecture03: Harmonic Analysis of Deep Convolutional Neural Networks (Helmut Bolcskei)
https://cs217.github.io/assets/lectures/bolcskei-stats385-slides.pdf
Lecture04: Convnets from First Principles: Generative Models, Dynamic Programming & EM (Ankit Patel)
https://cs217.github.io/assets/lectures/2017%20Stanford%20Guest%20Lecture%20-%20Stats%20385%20-%20Oct%202017.pdf
Lecture05: When Can Deep Networks Avoid the Curse of Dimensionality and Other Theoretical Puzzles (Tomaso Poggio)
https://cs217.github.io/assets/lectures/StanfordStats385-20171025-Lecture05-Poggio.pdf
Lecture06: Views of Deep Networksfrom Reproducing Kernel Hilbert Spaces (Zaid Harchaoui)
https://cs217.github.io/assets/lectures/lecture6_stats385_stanford_nov17.pdf
Lecture07: Understanding and Improving Deep Learning With Random Matrix Theory (Jeffrey Pennington)
https://cs217.github.io/assets/lectures/Understanding_and_improving_deep_learing_with_random_matrix_theory.pdf
Lecture08: Topology and Geometry of Half-Rectified Network Optimization (Joan Bruna)
https://cs217.github.io/assets/lectures/stanford_nov15.pdf
Lecture09: What’s Missing from Deep Learning? (Bruno Olshausen)
https://cs217.github.io/assets/lectures/lecture-09--20171129.pdf
Lecture10: Convolutional Neural Networks in View of Sparse Coding (Vardan Papyan)
https://cs217.github.io/assets/lectures/lecture-10--20171206.pdf