Fuzzing is a highly effective method for uncovering software vulnerabilities, but analyzing the resulting data typically requires substantial manual effort. This is amplified by the fact that fuzzing campaigns often find a large number of crashing inputs, many of which share the same underlying bug. Crash deduplication is the task of finding such duplicate crashing inputs and thereby reducing the data that needs to be examined. Many existing deduplication approaches rely on comparing stack traces or other information that is collected when a program crashes. Although various metrics for measuring the similarity of such pieces of information have been proposed, many do not yield satisfactory deduplication results. In this work, we present GPTrace, a deduplication workflow that leverages a large language model to evaluate the similarity of various data sources associated with crashes by computing embedding vectors and supplying those as input to a clustering algorithm. We evaluate our approach on over 300 000 crashing inputs belonging to 50 ground truth labels from 14 different targets. The deduplication results produced by GPTrace show a noticeable improvement over hand-crafted stack trace comparison methods and even more complex state-of-the-art approaches that are less flexible.
翻译:模糊测试是发现软件漏洞的一种高效方法,但分析其产生的数据通常需要大量人工投入。由于模糊测试活动常会生成大量崩溃输入,其中许多崩溃源自同一底层缺陷,这一问题进一步加剧。崩溃去重的任务在于识别此类重复的崩溃输入,从而减少待检数据量。现有许多去重方法依赖于比较程序崩溃时收集的堆栈轨迹或其他信息。尽管已提出多种衡量此类信息相似度的指标,但多数方法未能取得令人满意的去重效果。本研究提出GPTrace——一种利用大语言模型评估崩溃相关多源数据相似度的去重工作流,通过计算嵌入向量并将其输入聚类算法实现。我们在包含14个不同目标的50个真实标签、总计超过30万条崩溃输入的数据集上评估该方法。实验表明,GPTrace产生的去重结果相较于手工设计的堆栈轨迹比较方法有明显提升,甚至优于灵活性较低、更为复杂的现有先进方法。