The last two decades witnessed tremendous advances in the Information and Communications Technologies. Beside improvements in computational power and storage capacity, communication networks carry nowadays an amount of data which was not envisaged only few years ago. Together with their pervasiveness, network complexity increased at the same pace, leaving operators and researchers with few instruments to understand what happens in the networks, and, on the global scale, on the Internet. Fortunately, recent advances in data science and machine learning come to the rescue of network analysts, and allow analyses with a level of complexity and spatial/temporal scope not possible only 10 years ago. In my thesis, I take the perspective of an Internet Service Provider (ISP), and illustrate challenges and possibilities of analyzing the traffic coming from modern operational networks. I make use of big data and machine learning algorithms, and apply them to datasets coming from passive measurements of ISP and University Campus networks. The marriage between data science and network measurements is complicated by the complexity of machine learning algorithms, and by the intrinsic multi-dimensionality and variability of this kind of data. As such, my work proposes and evaluates novel techniques, inspired from popular machine learning approaches, but carefully tailored to operate with network traffic.
翻译:在过去20年中,信息和通信技术取得了巨大的进步。除了计算能力和存储能力的改进外,通信网络现在还携带着几年前没有预见到的大量数据。网络的复杂性随着其普及性而增加,网络的复杂性也以同样的速度增加,使操作者和研究人员没有多少工具来了解网络中发生的情况,而且在全球范围,在互联网上也是如此。幸运的是,数据科学和机器学习的最近进展是网络分析员的拯救工作,使得网络分析的复杂程度和空间/时空范围无法在10年前完成。在我的论文中,我从因特网服务供应商的角度出发,说明分析现代业务网络的交通的挑战和可能性。我利用大数据和机器学习算法,并将其应用于从ISP和大学校园网络被动测量得出的数据集。数据科学和网络测量的结合由于机器学习算法的复杂性以及这类数据固有的多维性和可变性而变得复杂。我的工作是提出和评估创新技术,这些技术来自流行的机器学习方法,但经过仔细调整后,与网络的运行。