Despite the prevalence of machine learning in many network traffic analysis tasks, from application identification to intrusion detection, the aspects of the machine learning pipeline that ultimately determine the performance of the model---feature selection and representation, model selection, and parameter tuning---remain manual and painstaking. This paper presents a method to automate these steps. We introduce nPrint, a tool that generates a unified packet representation that is amenable for representation learning and model training. We integrate nPrint with automated machine learning (AutoML), resulting in nPrintML, a pipeline that can quickly automate many network traffic analysis tasks. nPrintML often outperforms best known results for existing problems while automating many manual steps of the process. We have released nPrint, nPrintML, and the corresponding datasets from our evaluation to enable future work to build on these methods.
翻译:尽管在许多网络交通分析任务中,从应用识别到入侵检测,机器学习管道的各个方面普遍进行机器学习,这些方面最终决定了模型 -- -- 具体选择和代表、模型选择和参数调整 -- -- 剩余手工和艰苦的手法。本文介绍了使这些步骤自动化的方法。我们引入了nPrint,这是一个生成统一包表的工具,便于进行演示学习和模型培训。我们将nPrint与自动机器学习(Automal)相结合,从而产生了nPrintML,这是一个能够迅速使许多网络交通分析任务自动化的管道。nPrintML常常在使许多手工步骤自动化的同时,对现有问题取得最已知的结果。我们发布了nPrint, nPrintML, 以及我们评估中的相应数据集,以便今后能够利用这些方法开展工作。