Existing support for regular expressions in automated test generation or verification tools is lacking. Common aspects of regular expression engines found in mainstream programming languages, such as backreferences or greedy matching, are commonly ignored or imprecisely approximated, leading to poor test coverage or failed proofs. In this paper, we present the first complete strategy to faithfully reason about regular expressions in the context of symbolic execution, focusing on the operators found in JavaScript. We model regular expression operations using string constraints and classical regular expressions and use a refinement scheme to address the problem of matching precedence and greediness. Our survey of over 400,000 JavaScript packages from the NPM software repository shows that one fifth make use of complex regular expressions features. We implemented our model in a dynamic symbolic execution engine for JavaScript and evaluated it on over 1,000 Node.js packages containing regular expressions, demonstrating that the strategy is effective and can increase line coverage of programs by up to 30%
翻译:目前缺乏对自动测试生成或核查工具中常规表达式的现有支持。 在主流编程语言中找到的常规表达式引擎的常见方面,如背参照或贪婪匹配等,通常被忽视或不精确地近似,导致测试覆盖面低或证明失败。在本文件中,我们提出了第一个完整战略,以在象征性执行中忠实解释常规表达式,重点是在JavaScript发现的业务员。我们用字符串限制和经典常规表达式来模拟常规表达式,并使用完善计划来解决匹配优先性和贪婪问题。我们对国家预防机制软件库400 000多个 JavaScript 软件包的调查显示,五分之一以上的国家使用了复杂的常规表达式特征。我们用动态的JavaScript 符号执行引擎实施了模型,对包含常规表达式的1,000多个Node.js软件包进行了评估,表明该战略是有效的,可以将方案线覆盖范围增加30%。