The machine learning algorithm is gaining prominence in traffic identification research as it offers a way to overcome the shortcomings of port-based and deep packet inspection, especially for P2P-based Skype. However,recent studies have focused mainly on traffic identification based on a full-packet dataset, which poses great challenges to identifying online network traffic. This study aims to provide a new flow identification algorithm by taking the sampled flow records as the object. The study constructs flow records from a Skype set as the dataset, considers the inherent NETFLOW and extended flow metrics as features, and uses a fast correlation-based filter algorithm to select highly correlated features. The study also proposes a new NFI method that adopts a Bayesian updating mechanism to improve the classifier model. The experimental results show that the proposed scheme can achieve much better identification performance than existing state-of-the-art traffic identification methods, and a typical feature metric is analyzed in the sampling environment. The NFI method improves identification accuracy and reduces false positives and false negatives compared to other methods.
翻译:机器学习算法在交通识别研究中越来越突出,因为它为克服港口和深包检查的缺陷提供了一种途径,特别是P2P基Skype的缺陷提供了一种途径。然而,最近的研究主要侧重于基于全包装数据集的交通识别,这给识别在线网络流量带来了巨大挑战。这项研究的目的是以抽样流量记录为对象,提供新的流量识别算法。研究从Skype数据集中建立流动记录,将内含的NETFLOW和扩展流量指标视为特征,并使用快速相关过滤算法选择高度关联的特征。研究还提出了一种新的NFI方法,采用巴伊西亚更新机制改进分类模型。实验结果显示,拟议的计划可以比现有的最新交通识别方法取得更好的识别性能,并在取样环境中分析典型特征指标。NFI方法提高了识别准确性,并减少了假阳性和假反比其他方法差。