In this work we detail a novel open source library, called MMLSpark, that combines the flexible deep learning library Cognitive Toolkit, with the distributed computing framework Apache Spark. To achieve this, we have contributed Java Language bindings to the Cognitive Toolkit, and added several new components to the Spark ecosystem. In addition, we also integrate the popular image processing library OpenCV with Spark, and present a tool for the automated generation of PySpark wrappers from any SparkML estimator and use this tool to expose all work to the PySpark ecosystem. Finally, we provide a large library of tools for working and developing within the Spark ecosystem. We apply this work to the automated classification of Snow Leopards from camera trap images, and provide an end to end solution for the non-profit conservation organization, the Snow Leopard Trust.
翻译:在这项工作中,我们详细介绍了一个新的开放源库,名为MMLSpark,它将灵活的深层学习图书馆认知工具包与分布式计算框架Apache Spark结合起来。为了实现这一点,我们为Cognitive工具包贡献了爪哇语言绑定物,并为Spark生态系统添加了几个新的组成部分。此外,我们还将大众化图像处理库OpenCV与Spark整合在一起,并提供了一个工具,用于从任何SparkMLSSspestmator自动生成PySpark包装器,并使用这一工具将所有工作暴露在PySpark生态系统中。最后,我们为在Spark生态系统中工作和发展提供了庞大的工具库。我们把这项工作应用到从摄像陷阱图像对雪豹自动分类上,并为非营利性保护组织“雪豹信托”提供了最终解决方案。