Network traffic classification (NTC) is vital for efficient network management, security, and performance optimization, particularly with 5G/6G technologies. Traditional methods, such as deep packet inspection (DPI) and port-based identification, struggle with the rise of encrypted traffic and dynamic port allocations. Supervised learning methods provide viable alternatives but rely on large labeled datasets, which are difficult to acquire given the diversity and volume of network traffic. Meanwhile, unsupervised learning methods, while less reliant on labeled data, often exhibit lower accuracy. To address these limitations, we propose a novel framework that first leverages Self-Supervised Learning (SSL) with techniques such as autoencoders or Tabular Contrastive Learning (TabCL) to generate pseudo-labels from extensive unlabeled datasets, addressing the challenge of limited labeled data. We then apply traffic-adopted Confident Learning (CL) to refine these pseudo-labels, enhancing classification precision by mitigating the impact of noise. Our proposed framework offers a generalizable solution that minimizes the need for extensive labeled data while delivering high accuracy. Extensive simulations and evaluations, conducted using three datasets (ISCX VPN-nonVPN, self-generated dataset, and UCDavis--QUIC), and demonstrate that our method achieves superior accuracy compared to state-of-the-art techniques in classifying network traffic.
翻译:暂无翻译