The correctness of compilers is instrumental in the safety and reliability of other software systems, as bugs in compilers can produce executables that do not reflect the intent of programmers. Such errors are difficult to identify and debug. Random test program generators are commonly used in testing compilers, and they have been effective in uncovering bugs. However, the problem of guiding these test generators to produce test programs that are more likely to find bugs remains challenging. In this paper, we use the code snippets in the bug reports to guide the test generation. The main idea of this work is to extract insights from the bug reports about the language features that are more prone to inadequate implementation and using the insights to guide the test generators. We use the GCC C compiler to evaluate the effectiveness of this approach. In particular, we first cluster the test programs in the GCC bugs reports based on their features. We then use the centroids of the clusters to compute configurations for Csmith, a popular test generator for C compilers. We evaluated this approach on eight versions of GCC and found that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.

### 相关内容

GCC（GNU Compiler Collection，GNU 编译器套装），是一套由 GNU 开发的编程语言编译器。

Search-based approaches have been used in the literature to automate the process of creating unit test cases. However, related work has shown that generated unit-tests with high code coverage could be ineffective, i.e., they may not detect all faults or kill all injected mutants. In this paper, we propose CLING, an integration-level test case generation approach that exploits how a pair of classes, the caller and the callee, interact with each other through method calls. In particular, CLING generates integration-level test cases that maximize the Coupled Branches Criterion (CBC). Coupled branches are pairs of branches containing a branch of the caller and a branch of the callee such that an integration test that exercises the former also exercises the latter. CBC is a novel integration-level coverage criterion, measuring the degree to which a test suite exercises the interactions between a caller and its callee classes. We implemented CLING and evaluated the approach on 140 pairs of classes from five different open-source Java projects. Our results show that (1) CLING generates test suites with high CBC coverage, thanks to the definition of the test suite generation as a many-objectives problem where each couple of branches is an independent objective; (2) such generated suites trigger different class interactions and can kill on average 7.7% (with a maximum of 50%) of mutants that are not detected by tests generated at the unit level; (3) CLING can detect integration faults coming from wrong assumptions about the usage of the callee class (32 for our subject systems) that remain undetected when using automatically generated unit-level test suites.

Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are actually written in real projects. In this paper the following was investigated: (i) the denotation of the test word in different natural languages; (ii) whether the test word correlates with the presence of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists weak correlation (r = 0.655) between the word test and test cases presence in a class; (ii) the proposed algorithm using static file analysis correctly detected 95\% of test cases; (iii) 15\% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very low due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions and to mine tests to improve program comprehension in the future.

Developers create software branches for tentative feature addition and bug fixing, and periodically merge branches to release software with new features or repairing patches. When the program edits from different branches textually overlap (i.e., textual conflicts), or the co-application of those edits lead to compilation or runtime errors (i.e., compiling or dynamic conflicts), it is challenging and time-consuming for developers to eliminate merge conflicts. Prior studies examined %the popularity of merge conflicts and how conflicts were related to code smells or software development process; tools were built to find and solve conflicts. However, some fundamental research questions are still not comprehensively explored, including (1) how conflicts were introduced, (2) how developers manually resolved conflicts, and (3) what conflicts cannot be handled by current tools. For this paper, we took a hybrid approach that combines automatic detection with manual inspection to reveal 204 merge conflicts and their resolutions in 15 open-source repositories. %in the version history of 15 open-source projects. Our data analysis reveals three phenomena. First, compiling and dynamic conflicts are harder to detect, although current tools mainly focus on textual conflicts. Second, in the same merging context, developers usually resolved similar textual conflicts with similar strategies. Third, developers manually fixed most of the inspected compiling and dynamic conflicts by similarly editing the merged version as what they did for one of the branches. Our research reveals the challenges and opportunities for automatic detection and resolution of merge conflicts; it also sheds light on related areas like systematic program editing and change recommendation.

Artificial Intelligence (AI) planning is a flourishing research and development discipline that provides powerful tools for searching a course of action that achieves some user goal. While these planning tools show excellent performance on benchmark planning problems, they represent challenging software systems when it comes to their use and integration in real-world applications. In fact, even in-depth understanding of their internal mechanisms does not guarantee that one can successfully set up, use and manipulate existing planning tools. We contribute toward alleviating this situation by proposing a service-oriented planning architecture to be at the core of the ability to design, develop and use next-generation AI planning systems. We collect and classify common planning capabilities to form the building blocks of the planning architecture. We incorporate software design principles and patterns into the architecture to allow for usability, interoperability and reusability of the planning capabilities. Our prototype planning system demonstrates the potential of our approach for rapid prototyping and flexibility of system composition. Finally, we provide insight into the qualitative advantages of our approach when compared to a typical planning tool.

We present a new secret sharing algorithm that provides the storage efficiency of an Information Dispersal Algorithm (IDA) while providing perfect secret sharing. We achieve this by mixing the input message with random bytes generated using Repeatable Random Sequence Generator (RRSG). We also use the data from the RRSG to provide random polynomial evaluation points and optionally compute the polynomials on random isomorphic fields rather than a single fixed field.

Recent advancements in natural language processing \cite{gpt2} \cite{BERT} have led to near-human performance in multiple natural language tasks. In this paper, we seek to understand whether similar techniques can be applied to a highly structured environment with strict syntax rules. Specifically, we propose an end-to-end machine learning model for code generation in the Python language built on-top of pre-trained language models. We demonstrate that a fine-tuned model can perform well in code generation tasks, achieving a BLEU score of 0.22, an improvement of 46\% over a reasonable sequence-to-sequence baseline. All results and related code used for training and data processing are available on GitHub.

We present Calyx, a new intermediate language (IL) for compiling high-level programs into hardware designs. Calyx combines a hardware-like structural language with a software-like control flow representation with loops and conditionals. This split representation enables a new class of hardware-focused optimizations that require both structural and control flow information which are crucial for high-level programming models for hardware design. The Calyx compiler lowers control flow constructs using finite-state machines and generates synthesizable hardware descriptions. We have implemented Calyx in an optimizing compiler that translates high-level programs to hardware. We demonstrate Calyx using two DSL-to-RTL compilers, a systolic array generator and one for a recent imperative accelerator language, and compare them to equivalent designs generated using high-level synthesis (HLS). The systolic arrays are $4.6\times$ faster and $1.1\times$ larger on average than HLS implementations, and the HLS-like imperative language compiler is within a few factors of a highly optimized commercial HLS toolchain. We also describe three optimizations implemented in the Calyx compiler.

Software Quality Assurance (SQA) planning aims to define proactive plans, such as defining maximum file size, to prevent the occurrence of software defects in future releases. To aid this, defect prediction models have been proposed to generate insights as the most important factors that are associated with software quality. Such insights that are derived from traditional defect models are far from actionable-i.e., practitioners still do not know what they should do or avoid to decrease the risk of having defects, and what is the risk threshold for each metric. A lack of actionable guidance and risk threshold can lead to inefficient and ineffective SQA planning processes. In this paper, we investigate the practitioners' perceptions of current SQA planning activities, current challenges of such SQA planning activities, and propose four types of guidance to support SQA planning. We then propose and evaluate our AI-Driven SQAPlanner approach, a novel approach for generating four types of guidance and their associated risk thresholds in the form of rule-based explanations for the predictions of defect prediction models. Finally, we develop and evaluate an information visualization for our SQAPlanner approach. Through the use of qualitative survey and empirical evaluation, our results lead us to conclude that SQAPlanner is needed, effective, stable, and practically applicable. We also find that 80% of our survey respondents perceived that our visualization is more actionable. Thus, our SQAPlanner paves a way for novel research in actionable software analytics-i.e., generating actionable guidance on what should practitioners do and not do to decrease the risk of having defects to support SQA planning.

Generative Adversarial Networks (GAN) have shown great promise in tasks like synthetic image generation, image inpainting, style transfer, and anomaly detection. However, generating discrete data is a challenge. This work presents an adversarial training based correlated discrete data (CDD) generation model. It also details an approach for conditional CDD generation. The results of our approach are presented over two datasets; job-seeking candidates skill set (private dataset) and MNIST (public dataset). From quantitative and qualitative analysis of these results, we show that our model performs better as it leverages inherent correlation in the data, than an existing model that overlooks correlation.

Pouria Derakhshanfar,Xavier Devroey,Annibale Panichella,Andy Zaidman,Arie van Deursen
0+阅读 · 2月23日
Matej Madeja,Jaroslav Porubän,Michaela Bačíková,Matúš Sulír,Ján Juhár,Sergej Chodarev,Filip Gurbáľ
0+阅读 · 2月23日
Bowen Shen,Cihan Xiao,Na Meng,Fei He
0+阅读 · 2月22日
Sebastian Graef,Ilche Georgievski
0+阅读 · 2月22日
Arvind Srinivasan,Chien-Chung Chan
0+阅读 · 2月21日
Luis Perez,Lizi Ottens,Sudharshan Viswanathan
0+阅读 · 2月21日
Tao Xiao,Dong Wang,Shane McIntosh,Hideaki Hata,Raula Gaikovina Kula,Takashi Ishio,Kenichi Matsumoto
0+阅读 · 2月19日
0+阅读 · 2月19日
Dilini Rajapaksha,Chakkrit Tantithamthavorn,Jirayus Jiarpakdee,Christoph Bergmeir,John Grundy,Wray Buntine
0+阅读 · 2月19日
Shreyas Patel,Ashutosh Kakadiya,Maitrey Mehta,Raj Derasari,Rahul Patel,Ratnik Gandhi
5+阅读 · 2018年4月3日

20+阅读 · 2020年9月18日

29+阅读 · 2020年5月4日

54+阅读 · 2019年11月17日

37+阅读 · 2019年10月9日

28+阅读 · 2019年9月29日

12+阅读 · 2019年7月1日
CreateAMind
20+阅读 · 2018年9月12日
Call4Papers
4+阅读 · 2018年3月13日
Top