教程 | 从零开始：TensorFlow机器学习模型快速部署指南

会员服务 ·

教程 | 从零开始：TensorFlow机器学习模型快速部署指南

2018 年 1 月 31 日 机器之心

选自Hive Blog

作者：Bowei

机器之心编译

参与：李亚洲、李泽南

本文将介绍一种将训练后的机器学习模型快速部署到生产种的方式。如果你已使用 TensorFlow 或 Caffe 等深度学习框架训练好了 ML 模型，该模型可以作为 demo。如果你更喜欢轻量级的解决方案，请阅读本文。

GitHub 地址：https://github.com/hiveml/simple-ml-serving

其中包含的条目有：

检查 TensorFlow 安装：https://github.com/hiveml/simple-ml-serving/blob/master/test/test_tensorflow.sh
利用 stdin 运行在线分类：https://github.com/hiveml/simple-ml-serving/blob/master/test/test_label_image.sh
在本地主机上运行在线分类：https://github.com/hiveml/simple-ml-serving/blob/master/test/test_tf_classify_server.sh
将分类器放在硬编码代理器后面：https://github.com/hiveml/simple-ml-serving/blob/master/test/test_basic_proxy.sh
将分类器放在可实现服务发现的代理器后面：https://github.com/hiveml/simple-ml-serving/blob/master/test/test_seaport_proxy.sh
利用伪 DN 启用分类器：https://github.com/hiveml/simple-ml-serving/blob/master/test/test_p2p_proxy.sh

生产环境中的机器学习

第一次进入 Hive 的机器学习空间，我们就已经拥有数百万个真值标注图像，这可以让我们在一周时间内从头训练（即随机权重）适用于特定使用案例的顶尖深度卷积图像分类模型。更典型的 ML 用例通常基于数百个图像，这种情况我推荐大家对现有模型进行微调。例如，https://www.tensorflow.org/tutorials/image_retraining 页面上有如何微调 ImageNet 模型对花样本数据集（3647 张图像，5 个类别）进行分类的教程。

安装 Bazel 和 TensorFlow 后，你需要运行以下代码，构建大约需要 30 分钟，训练需要 5 分钟：

  
    
    
    
   
     
     
     (
   
     
     
     cd "$HOME" && \
   
     
     
     curl -O http://download.tensorflow.org/example_images/flower_photos.tgz && \
   
     
     
     tar xzf flower_photos.tgz ;
   
     
     
     ) && \
   
     
     
     bazel build tensorflow/examples/image_retraining:retrain \
   
     
     
               tensorflow/examples/image_retraining:label_image \
   
     
     
     && \
   
     
     
     bazel-bin/tensorflow/examples/image_retraining/retrain \
   
     
     
       --image_dir "$HOME"/flower_photos \
   
     
     
       --how_many_training_steps=200
   
     
     
     && \
   
     
     
     bazel-bin/tensorflow/examples/image_retraining/label_image \
   
     
     
       --graph=/tmp/output_graph.pb \
   
     
     
       --labels=/tmp/output_labels.txt \
   
     
     
       --output_layer=final_result:0 \
   
     
     
       --image=$HOME/flower_photos/daisy/21652746_cc379e0eea_m.jpg

或者，如果你有 Docker，可以使用预制 Docker 图像，

  
    
    
    
   
     
     
     sudo docker run -it --net=host liubowei/simple-ml-serving:latest /bin/bash
   
     
     
     
   
     
     
     >>> cat test.sh && bash test.sh

进入容器中的交互式 shell，运行以上命令。你也可以阅读下文，在容器中按照下文说明进行操作。

现在，TensorFlow 将模型信息保存至/tmp/output_graph.pb 和 /tmp/output_labels.txt，二者作为命令行参数被输入至 label_image.py (https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/image_retraining/label_image.py) 脚本。谷歌的图像识别教程也与另一个脚本（https://github.com/tensorflow/models/blob/master/tutorials/image/imagenet/classify_image.py#L130）有关，但是在这个例子中，我们将继续使用 label_image.py。

将单点推断转换成在线推断（TensorFlow）

如果我们只想接受标准输入的文件名，一行一个，则我们可以轻松实现「在线」推断：

  
    
    
    
   
     
     
     while read line ; do
   
     
     
     bazel-bin/tensorflow/examples/image_retraining/label_image \
   
     
     
     --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
   
     
     
     --output_layer=final_result:0 \
   
     
     
     --image="$line" ;
   
     
     
     done

如果以性能为出发点来看，这太糟糕了：我们需要为每个输入样本重新加载神经网络、权重、整个 TensorFlow 架构和 Python！

我们当然可以做得更好。让我们从编辑 label_image.py script 开始。它的地址为 bazel-bin/tensorflow/examples/image_retraining/label_image.runfiles/org_tensorflow/tensorflow/examples/image_retraining/label_image.py。

我们将以下行

  
    
    
    
   
     
     
     141:  run_graph(image_data, labels, FLAGS.input_layer, FLAGS.output_layer,
   
     
     
     142:        FLAGS.num_top_predictions)

改为：

  
    
    
    
   
     
     
     141:  for line in sys.stdin:
   
     
     
     142:    run_graph(load_image(line), labels, FLAGS.input_layer, FLAGS.output_layer,
   
     
     
     142:        FLAGS.num_top_predictions)

这样速度快多了，但是仍然不是最好！

原因在于第 100 行的 with tf.Session() as sess 构造。本质上，TensorFlow 在每次启用 run_graph 时，将所有计算加载至内存中。如果你试着在 GPU 上执行推断时就会明显发现这一现象，你会看到 GPU 内存随着 TensorFlow 在 GPU 上加载和卸载模型参数而升降。据我所知，该构造在其他 ML 框架如 Caffe 或 PyTorch 中不存在。

解决方案是去掉 with 语句，向 run_graph 添加 sess 变量：

  
    
    
    
   
     
     
     def run_graph(image_data, labels, input_layer_name, output_layer_name,
   
     
     
                   num_top_predictions, sess):
   
     
     
         # Feed the image_data as input to the graph.
   
     
     
         #   predictions will contain a two-dimensional array, where one
   
     
     
         #   dimension represents the input image count, and the other has
   
     
     
         #   predictions per class
   
     
     
         softmax_tensor = sess.graph.get_tensor_by_name(output_layer_name)
   
     
     
         predictions, = sess.run(softmax_tensor, {input_layer_name: image_data})
   
     
     
         # Sort to show labels in order of confidence
   
     
     
         top_k = predictions.argsort()[-num_top_predictions:][::-1]
   
     
     
         for node_id in top_k:
   
     
     
           human_string = labels[node_id]
   
     
     
           score = predictions[node_id]
   
     
     
           print('%s (score = %.5f)' % (human_string, score))
   
     
     
         return [ (labels[node_id], predictions[node_id].item()) for node_id in top_k ] # numpy floats are not json serializable, have to run item
   
     
     
     
   
     
     
     ...
   
     
     
     
   
     
     
       with tf.Session() as sess:
   
     
     
         for line in sys.stdin:
   
     
     
           run_graph(load_image(line), labels, FLAGS.input_layer, FLAGS.output_layer,
   
     
     
               FLAGS.num_top_predictions, sess)

代码地址：https://github.com/hiveml/simple-ml-serving/blob/master/label_image.py

运行后，你会发现每张图像花费时间约为 0.1 秒，这样的速度快到可以在线使用了。

将单点推断转换成在线推断（其他 ML 框架）

Caffe 使用其 net.forward 代码，详见：http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb。
Mxnet 也很独特：它实际上已开源可用的推断服务器代码：https://github.com/awslabs/mxnet-model-server。

部署

计划是将代码封装进 Flask app。Flask 是一个轻量级 Python 网页框架，允许用极少的工作运行 http api 服务器。

作为快速推断，下列 Flask app 接受 multipart/form-data 的 POST 请求：

  
    
    
    
   
     
     
     #!/usr/bin/env python
   
     
     
     # usage: python echo.py to launch the server ; and then in another session, do
   
     
     
     # curl -v -XPOST 127.0.0.1:12480 -F "data=@./image.jpg"
   
     
     
     from flask import Flask, request
   
     
     
     app = Flask(__name__)
   
     
     
     @app.route('/', methods=['POST'])
   
     
     
     def classify():
   
     
     
         try:
   
     
     
             data = request.files.get('data').read()
   
     
     
             print repr(data)[:1000]
   
     
     
             return data, 200
   
     
     
         except Exception as e:
   
     
     
             return repr(e), 500
   
     
     
     app.run(host='127.0.0.1',port=12480)

下面是对应的 Flask app，可连接上文提到的 run_graph：

  
    
    
    
   
     
     
     And here is the corresponding flask app hooked up to run_graph above:
   
     
     
     
   
     
     
     #!/usr/bin/env python
   
     
     
     # usage: bash tf_classify_server.sh
   
     
     
     from flask import Flask, request
   
     
     
     import tensorflow as tf
   
     
     
     import label_image as tf_classify
   
     
     
     import json
   
     
     
     app = Flask(__name__)
   
     
     
     FLAGS, unparsed = tf_classify.parser.parse_known_args()
   
     
     
     labels = tf_classify.load_labels(FLAGS.labels)
   
     
     
     tf_classify.load_graph(FLAGS.graph)
   
     
     
     sess = tf.Session()
   
     
     
     @app.route('/', methods=['POST'])
   
     
     
     def classify():
   
     
     
         try:
   
     
     
             data = request.files.get('data').read()
   
     
     
             result = tf_classify.run_graph(data, labels, FLAGS.input_layer, FLAGS.output_layer, FLAGS.num_top_predictions, sess)
   
     
     
             return json.dumps(result), 200
   
     
     
         except Exception as e:
   
     
     
             return repr(e), 500
   
     
     
     app.run(host='127.0.0.1',port=12480)

看起来还不错，除了 Flask 和 TensorFlow 完全同步以外：执行图像分类时，Flask 按照接收请求的顺序一次处理一个请求，而 TensorFlow 完全占用线程。

如上所述，速度的瓶颈可能仍然在于实际计算量，因此升级 Flask 封装器代码没有太大意义。或许该代码足以处理加载。有两个明显的方式可以扩大请求吞吐量：通过增加工作线程的数量来水平扩大请求吞吐量（下一节将讲述），或利用 GPU 和批逻辑（batching logic）垂直扩大请求吞吐量。后者的实现要求网页服务器一次处理多个挂起请求，并决定是否等待较大批次还是将其发送至 TensorFlow 图线程进行分类，对此 Flask app 完全不适合。两种方式使用 Twisted + Klein 用 Python 写代码；如果你偏好第一类事件循环支持，并希望能够连接到非 Python ML 框架如 Torch，则需要使用 Node.js + ZeroMQ。

扩展：负载平衡和服务发现

现在我们已经有一个模型可用的服务器，但是它可能太慢，或我们的负载太高。我们想运行更多此类服务器，那么我们应该怎样在多个服务器上对其进行分布呢？普通方法是添加一个代理层，可以是 haproxy 或 nginx，可以平衡后端服务器之间的负载，同时向用户呈现一个统一的界面。下面是运行初级 Node.js 负载平衡器 http proxy 的示例代码：

  
    
    
    
   
     
     
     // Usage : node basic_proxy.js WORKER_PORT_0,WORKER_PORT_1,...
   
     
     
     const worker_ports = process.argv[2].split(',')
   
     
     
     if (worker_ports.length === 0) { console.err('missing worker ports') ; process.exit(1) }
   
     
     
     
   
     
     
     const proxy = require('http-proxy').createProxyServer({})
   
     
     
     proxy.on('error', () => console.log('proxy error'))
   
     
     
     
   
     
     
     let i = 0
   
     
     
     require('http').createServer((req, res) => {
   
     
     
       proxy.web(req,res, {target: 'http://localhost:' + worker_ports[ (i++) % worker_ports.length ]})
   
     
     
     }).listen(12480)
   
     
     
     console.log(`Proxying localhost:${12480} to [${worker_ports.toString()}]`)
   
     
     
     
   
     
     
     // spin up the ML workers
   
     
     
     const { exec } = require('child_process')
   
     
     
     worker_ports.map(port => exec(`/bin/bash ./tf_classify_server.sh ${port}`))

为了自动检测后端服务器的数量和地址，人们通常使用一个「服务发现」工具，它可能和负载平衡器捆绑在一起，也可能分开。一些有名的工具，如 Consul 和 Zookeeper。设置并学习如何使用此类工具超出了本文范畴，因此，我使用 node.js 服务发现包 seaport 推断了一个非常初级的代理。代理代码：

  
    
    
    
   
     
     
     // Usage : node seaport_proxy.js
   
     
     
     const seaportServer = require('seaport').createServer()
   
     
     
     seaportServer.listen(12481)
   
     
     
     const proxy = require('http-proxy').createProxyServer({})
   
     
     
     proxy.on('error', () => console.log('proxy error'))
   
     
     
     
   
     
     
     let i = 0
   
     
     
     require('http').createServer((req, res) => {
   
     
     
       seaportServer.get('tf_classify_server', worker_ports => {
   
     
     
         const this_port = worker_ports[ (i++) % worker_ports.length ].port
   
     
     
         proxy.web(req,res, {target: 'http://localhost:' + this_port })
   
     
     
       })
   
     
     
     }).listen(12480)
   
     
     
     console.log(`Seaport proxy listening on ${12480} to '${'tf_classify_server'}' servers registered to ${12481}`)

工作线程代码：

  
    
    
    
   
     
     
     // Usage : node tf_classify_server.js
   
     
     
     const port = require('seaport').connect(12481).register('tf_classify_server')
   
     
     
     console.log(`Launching tf classify worker on ${port}`)
   
     
     
     require('child_process').exec(`/bin/bash ./tf_classify_server.sh ${port}`)

但是，在应用到机器学习时，这个配置会遇到带宽问题。

系统如果每秒钟处理数十、数百张图片，它就会卡在系统带宽上。在目前的装配上，所有的数据需要通过我们的单个 seaport 主机，也是面向客户端的单个端点。

为了解决这个问题，我们需要客户不点击单个端点：http://127.0.0.1:12480，而是在后端服务器间自动旋转来点击。如果你懂网络架构，这听起来更像是 DNS 的活。

但是，配置定制的 DNS 服务器不在本文的讨论范围。把客户端代码改编遵循成 2 阶「手动 DNS」协议就行，我们能重复使用基本的 seaport proxy 来实现「端对端的」协议，其中客户能直接连接到服务器：

代理代码：

  
    
    
    
   
     
     
     // Usage : node p2p_proxy.js
   
     
     
     const seaportServer = require('seaport').createServer()
   
     
     
     seaportServer.listen(12481)
   
     
     
     
   
     
     
     let i = 0
   
     
     
     require('http').createServer((req, res) => {
   
     
     
       seaportServer.get('tf_classify_server', worker_ports => {
   
     
     
         const this_port = worker_ports[ (i++) % worker_ports.length ].port
   
     
     
         res.end(`${this_port}
   
     
     
     `)
   
     
     
       })
   
     
     
     }).listen(12480)
   
     
     
     console.log(`P2P seaport proxy listening on ${12480} to 'tf_classify_server' servers registered to ${12481}`)

（worker code 和上面一样）

客户端代码：

  
    
    
    
   
     
     
     curl -v -XPOST localhost:`curl localhost:12480` -F"data=@$HOME/flower_photos/daisy/21652746_cc379e0eea_m.jpg"

结论与拓展阅读

这个时候，你应该上手做点什么，但这肯定也不是不会过时的技术。在此文章中，还有很多重要的主题没被覆盖到：

在新硬件上的自动开发与装配

在自己的硬件上，值得关注的工具包括 Openstack/VMware，还有安装 Docker、管理网络路径的 Chef/Puppet，安装 TensorFlow、Python 等等的 Docker。
在云端，Kubernetes 或者 Marathon/Mesos 都非常棒

模型版本管理

一开始手动管理模型不是很难
TensorFlow Serving 是处理这个问题的不错工具，还有批处理和整体部署，非常彻底。缺点是有点难以配置，也难以编写客户端代码，此外还不支持 Caffe/PyTorch。

如何从 Matlab 迁移机器学习代码？

在开发产品中不要用 Matlab（译者注：仅代表作者观点）。

GPU 驱动、Cuda、CUDNN

使用英伟达容器并尝试寻找一些在线 Dorckerfiles

后处理层。一旦你在开发产品过程中找到一些不同的机器学习模型，你可能想要混合这些模型，并为不同的使用案例匹配不同的模型——也就是模型 B 没结果跑模型 A，在 Caffe 上跑模型 C，并把结果传送到 TensorFlow 上跑的模型 D，等等。

原文链接：https://thehive.ai/blog/simple-ml-serving

本文为机器之心编译，转载请联系本公众号获得授权。

✄------------------------------------------------

加入机器之心（全职记者/实习生）：hr@jiqizhixin.com

投稿或寻求报道：editor@jiqizhixin.com

广告&商务合作：bd@jiqizhixin.com

登录查看更多

相关内容

SimPLe

关注 4

【干货书】用Python构建聊天机器人，205页pdf，使用自然语言处理和机器学习

专知会员服务

214+阅读 · 2020年6月14日

最新TensorFlow2.0机器学习实用指南—第二版（附279页pdf下载）

专知会员服务

272+阅读 · 2020年6月9日

《Python机器学习项目实战》，135页pdf带你小白入门机器学习

专知会员服务

166+阅读 · 2020年6月6日

【2020必看书】TinyML-微型化机器学习，149页pdf，在超低功耗微控制器上用TensorFlow Lite实现机器学习

专知会员服务

140+阅读 · 2020年2月19日

【2020新书】JavaScript神经网络在TensorFlow.js中的深度学习，561页pdf

专知会员服务

102+阅读 · 2020年2月4日

Tensorflow GNN实战：手把手教你使用tf_geometric构建图自编码器GAE（附完整代码）

专知会员服务

74+阅读 · 2020年1月30日

TensorFlow Lite指南实战《TensorFlow Lite A primer》，附48页PPT

专知会员服务

68+阅读 · 2020年1月17日

【新书】Python强化学习-基于Tensorflow与Keras和OpenAI Gym实战, 177页pdf

专知会员服务

176+阅读 · 2020年1月17日

【新书】学习TensorFlow2.0，177页pdf，使用Python实现机器学习和深度学习模型

专知会员服务

222+阅读 · 2019年12月28日

谷歌机器学习速成课程中文版pdf

专知会员服务

143+阅读 · 2019年12月4日

【初学者系列】tensorboard学习笔记

专知

7+阅读 · 2019年10月4日

如何用TF Serving部署TensorFlow模型

AI研习社

26+阅读 · 2019年3月27日

ML通用指南：文本分类详细教程（上）

论智

19+阅读 · 2018年7月29日

教程 | 如何将模型部署到安卓移动端，这里有一份简单教程

机器之心

4+阅读 · 2018年7月17日

手把手：我的深度学习模型训练好了，然后要做啥？

大数据文摘

5+阅读 · 2018年2月7日

手把手教你如何部署深度学习模型

全球人工智能

15+阅读 · 2018年2月5日

教程帖：深度学习模型的部署

论智

8+阅读 · 2018年1月20日

TensorFlow图像分类教程

云栖社区

9+阅读 · 2017年12月29日

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

10个深度学习软件的安装指南（附代码）

数据派THU

17+阅读 · 2017年11月18日

TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank

Arxiv

5+阅读 · 2019年5月17日

How to train your MAML

Arxiv

26+阅读 · 2019年3月5日

Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning

Arxiv

8+阅读 · 2019年1月7日

Efficient end-to-end learning for quantizable representations

Arxiv

4+阅读 · 2018年6月12日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Understanding disentangling in $β$-VAE

Arxiv

4+阅读 · 2018年4月10日

Low-Shot Learning from Imaginary Data

Arxiv

15+阅读 · 2018年4月3日

Interactive Generative Adversarial Networks for Facial Expression Generation in Dyadic Interactions

Arxiv

4+阅读 · 2018年1月30日

Detecting and counting tiny faces

Arxiv

4+阅读 · 2018年1月19日

Twitter Sentiment Analysis

Arxiv

5+阅读 · 2015年9月14日

VIP会员