Measuring the function similarity to detect bugs is effective, but the statements unrelated to the bugs can impede the performance due to the noise interference. Suppressing the noise interference in existing works does not manage the tough job, i.e., eliminating the noise in the targets. In this paper, we propose MATUS to mitigate the target noise for precise bug detection based on similarity measurement. Feature slices are extracted from both the buggy query and the targets to represent the semantic feature of (potential) bug logics. In particular, MATUS guides the target slicing with the prior knowledge from the buggy code, in an end-to-end way to pinpoint the slicing criterion in the targets. All feature slices are embedded and compared based on the vector similarity. Buggy candidates are audited to confirm unknown bugs in the targets. Experiments show that MATUS holds advantages in bug detection for real-world projects with acceptable efficiency. In total, MATUS has spotted 31 unknown bugs in the Linux kernel. All of them have been confirmed by the kernel developers, and 11 have been assigned CVEs.
翻译:通过函数相似性度量进行缺陷检测是有效的,但缺陷无关语句会因噪声干扰而影响检测性能。现有工作中抑制噪声干扰的方法未能解决核心难题,即消除目标代码中的噪声。本文提出MATUS方法,基于相似性度量缓解目标噪声以实现精确缺陷检测。该方法从缺陷查询代码和目标代码中提取特征切片,以表征(潜在)缺陷逻辑的语义特征。特别地,MATUS利用缺陷代码的先验知识指导目标代码切片,通过端到端方式精确定位目标代码中的切片准则。所有特征切片经过嵌入后基于向量相似度进行比较。通过对缺陷候选代码进行审计,最终确认目标代码中的未知缺陷。实验表明,MATUS在真实项目缺陷检测中具有显著优势,且保持可接受的效率。总计在Linux内核中发现31个未知缺陷,均获得内核开发者确认,其中11个被分配了CVE编号。