Artificial intelligence techniques have achieved strong performance in classifying Windows Portable Executable (PE) malware, but their reliability often degrades under dataset shifts, leading to misclassifications with severe security consequences. To address this, we enhance an existing LightGBM (LGBM) malware detector by integrating Neural Networks (NN), PriorNet, and Neural Network Ensembles, evaluated across three benchmark datasets: EMBER, BODMAS, and UCSB. The UCSB dataset, composed mainly of packed malware, introduces a substantial distributional shift relative to EMBER and BODMAS, making it a challenging testbed for robustness. We study uncertainty-aware decision strategies, including probability thresholding, PriorNet, ensemble-derived estimates, and Inductive Conformal Evaluation (ICE). Our main contribution is the use of ensemble-based uncertainty estimates as Non-Conformity Measures within ICE, combined with a novel threshold optimisation method. On the UCSB dataset, where the shift is most severe, the state-of-the-art probability-based ICE (SOTA) yields an incorrect acceptance rate (IA%) of 22.8%. In contrast, our method reduces this to 16% a relative reduction of about 30% while maintaining competitive correct acceptance rates (CA%). These results demonstrate that integrating ensemble-based uncertainty with conformal prediction provides a more reliable safeguard against misclassifications under extreme dataset shifts, particularly in the presence of packed malware, thereby offering practical benefits for real-world security operations.
翻译:人工智能技术在Windows可移植可执行文件(PE)恶意软件分类中已展现出优异性能,但其可靠性在数据集偏移条件下常出现退化,导致可能引发严重安全后果的误分类。为解决此问题,我们通过集成神经网络(NN)、PriorNet及神经网络集成方法,对现有LightGBM(LGBM)恶意软件检测器进行增强,并在EMBER、BODMAS和UCSB三个基准数据集上进行评估。UCSB数据集主要由加壳恶意软件构成,相对于EMBER和BODMAS存在显著分布偏移,这使其成为鲁棒性测试的挑战性平台。我们研究了多种不确定性感知决策策略,包括概率阈值法、PriorNet、集成衍生估计以及归纳共形评估(ICE)。本研究的主要贡献在于将基于集成的不确定性估计作为ICE中的非共形性度量,并结合一种新颖的阈值优化方法。在分布偏移最严重的UCSB数据集上,基于概率的当前最优ICE方法(SOTA)产生了22.8%的错误接受率(IA%)。相比之下,我们的方法将该指标降低至16%,相对减少约30%,同时保持具有竞争力的正确接受率(CA%)。这些结果表明,在极端数据集偏移(尤其是存在加壳恶意软件)条件下,将集成不确定性估计与共形预测相结合能为误分类提供更可靠的防护机制,从而为实际安全运营带来切实效益。