与现实世界爪哇项目数学公式有关的代码的多样性和频率 (On the diversity and frequency of code related to mathematical formulas in real-world Java projects)

In this paper, the term formula code refers to fragments of source code that implement a mathematical formula. We present empirical studies that analyze the diversity and frequency of formula code in open-source-software projects. In an exploratory study, we investigated what kinds of formulas are implemented in real-world Java projects and derived syntactical patterns and constraints. We refined these patterns for sum and product formulas to automatically detect formula code in software archives and to reconstruct the implemented formula in mathematical notation. In a quantitative study of a large sample of engineered Java projects on GitHub we analyzed the frequency of formula code and estimated that one of 700 lines of code in this sample implements a sum or product formula. For a sample of scientific-computing projects, we found that one of 100 lines of code implements a sum or product formula. To assess the need for tool support, we investigated the helpfulness of comments for program understanding in a sample of formula-code fragments and performed an online survey. Our findings provide first insights into the characteristics of formula code, that can motivate further studies on the role of formula code in software projects and the design of formula-related tools.

翻译：在本文中,术语公式代码是指执行数学公式的源代码的碎片。我们介绍了分析开源软件项目中公式代码多样性和频率的经验性研究。在一项探索性研究中,我们调查了在现实世界爪哇项目中实施何种公式以及衍生的合成模式和制约。我们对这些总和和产品公式的模式进行了改进,以自动检测软件档案中的公式代码,并在数学符号中重建已执行的公式。在对GitHub上设计过的爪哇项目的大量样本进行的数量研究中,我们分析了公式代码的频率,并估算了该样本中700行代码中的1行含有一个总和或产品公式。对于科学计算项目样本,我们发现100行代码中的1行含有一个总和或产品公式。为了评估对工具支持的需要,我们研究了在公式代码碎片样本中对方案理解意见的有用性,并进行了在线调查。我们的调查结果对公式代码的特征提供了初步的深入了解,从而可以激励进一步研究公式代码在软件项目中的作用和公式相关工具的设计。