使用 ML.NET在微软上学习机器 (Machine Learning at Microsoft with ML .NET)

Zeeshan Ahmed,Saeed Amizadeh,Mikhail Bilenko,Rogan Carr,Wei-Sheng Chin,Yael Dekel,Xavier Dupre,Vadim Eksarevskiy,Eric Erhardt,Costin Eseanu,Senja Filipi,Tom Finley,Abhishek Goswami,Monte Hoover,Scott Inglis,Matteo Interlandi,Shon Katzenberger,Najeeb Kazmi,Gleb Krivosheev,Pete Luferenko,Ivan Matantsev,Sergiy Matusevych,Shahab Moradi,Gani Nazirov,Justin Ormont,Gal Oshri,Artidoro Pagnoni,Jignesh Parmar,Prabhat Roy,Sarthak Shah,Mohammad Zeeshan Siddiqui,Markus Weimer,Shauheen Zahirazami,Yiwen Zhu

Machine Learning is transitioning from an art and science into a technology available to every developer. In the near future, every application on every platform will incorporate trained models to encode data-based decisions that would be impossible for developers to author. This presents a significant engineering challenge, since currently data science and modeling are largely decoupled from standard software development processes. This separation makes incorporating machine learning capabilities inside applications unnecessarily costly and difficult, and furthermore discourage developers from embracing ML in first place. In this paper we present ML .NET, a framework developed at Microsoft over the last decade in response to the challenge of making it easy to ship machine learning models in large software applications. We present its architecture, and illuminate the application demands that shaped it. Specifically, we introduce DataView, the core data abstraction of ML .NET which allows it to capture full predictive pipelines efficiently and consistently across training and inference lifecycles. We close the paper with a surprisingly favorable performance study of ML .NET compared to more recent entrants, and a discussion of some lessons learned.

翻译：机器学习正在从艺术和科学向每个开发者可用的技术过渡。在不远的将来,每个平台上的每一项应用都将纳入经过训练的模型,以将开发者无法接受的基于数据的决定编码起来。这提出了重大的工程挑战,因为目前的数据科学和模型模型在很大程度上与标准的软件开发过程脱钩。这种分离使得将机器学习能力纳入应用中不必要地花费和困难,进一步阻止开发者首先接受ML。在本文中,我们介绍了微软在过去十年里开发的一个框架ML.NET,这个框架是为了应对在大型软件应用中方便运输机器学习模型的挑战。我们展示了它的架构,并说明了形成它的应用要求。具体地说,我们引入了DataView,即ML.NET的核心数据抽象,使其能够在培训和推断生命周期中高效和连贯地捕捉到完全预测性管道。我们关闭了这份文件,对ML.NET进行了令人惊讶的优异的绩效研究,与最近加入者相比,并讨论了一些经验教训。

相关内容

Machine Learning

关注 2221

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日