Testing is widely recognized as an important stage of the software development lifecycle. Effective software testing can provide benefits such as documentation, bug finding, and preventing regressions. In particular, unit tests document a unit's \textit{intended} functionality. A \textit{test oracle}, typically expressed as an condition, documents the intended behavior of the unit under a given test prefix. Synthesizing a functional test oracle is a challenging problem, as it has to capture the intended functionality and not the implemented functionality. In this paper, we propose \toolName{} (\underline{N}eural \underline{U}nit \underline{T}est \underline{S}uggestions), a unified transformer-based neural approach to infer both exceptional and assertion test oracles based on the context of the focal method. Our approach can handle units with ambiguous or missing documentations, and even units with a missing implementation. We evaluate our approach on both oracle inference accuracy and functional bug-finding. Our technique improves accuracy by 33\% over existing oracle inference approaches, achieving 96\% overall accuracy on a held out test dataset. Furthermore, we show that when integrated with a automated test generation tool (EvoSuite), our approach finds 54 real world bugs in large-scale Java programs, including \DISTINCTBUGS{} bugs that are not found by any other automated testing method in our evaluation.

### 相关内容

Unit testing is a stage of testing where the smallest segment of code that can be tested in isolation from the rest of the system - often a class - is tested. Unit tests are typically written as executable code, often in a format provided by a unit testing framework such as pytest for Python. Creating unit tests is a time and effort-intensive process with many repetitive, manual elements. To illustrate how AI can support unit testing, this chapter introduces the concept of search-based unit test generation. This technique frames the selection of test input as an optimization problem - we seek a set of test cases that meet some measurable goal of a tester - and unleashes powerful metaheuristic search algorithms to identify the best possible test cases within a restricted timeframe. This chapter introduces two algorithms that can generate pytest-formatted unit tests, tuned towards coverage of source code statements. The chapter concludes by discussing more advanced concepts and gives pointers to further reading for how artificial intelligence can support developers and testers when unit testing software.

When engineers train deep learning models, they are very much 'flying blind'. Commonly used methods for real-time training diagnostics, such as monitoring the train/test loss, are limited. Assessing a network's training process solely through these performance indicators is akin to debugging software without access to internal states through a debugger. To address this, we present Cockpit, a collection of instruments that enable a closer look into the inner workings of a learning machine, and a more informative and meaningful status report for practitioners. It facilitates the identification of learning phases and failure modes, like ill-chosen hyperparameters. These instruments leverage novel higher-order information about the gradient distribution and curvature, which has only recently become efficiently accessible. We believe that such a debugging tool, which we open-source for PyTorch, is a valuable help in troubleshooting the training process. By revealing new insights, it also more generally contributes to explainability and interpretability of deep nets.

The rapid development of Convolutional Neural Networks (CNNs) in recent years has triggered significant breakthroughs in many machine learning (ML) applications. The ability to understand and compare various CNN models available is thus essential. The conventional approach with visualizing each model's quantitative features, such as classification accuracy and computational complexity, is not sufficient for a deeper understanding and comparison of the behaviors of different models. Moreover, most of the existing tools for assessing CNN behaviors only support comparison between two models and lack the flexibility of customizing the analysis tasks according to user needs. This paper presents a visual analytics system, CNN Comparator (CNNC), that supports the in-depth inspection of a single CNN model as well as comparative studies of two or more models. The ability to compare a larger number of (e.g., tens of) models especially distinguishes our system from previous ones. With a carefully designed model visualization and explaining support, CNNC facilitates a highly interactive workflow that promptly presents both quantitative and qualitative information at each analysis stage. We demonstrate CNNC's effectiveness for assisting ML practitioners in evaluating and comparing multiple CNN models through two use cases and one preliminary evaluation study using the image classification tasks on the ImageNet dataset.

Data augmentation has been widely used to improve generalizability of machine learning models. However, comparatively little work studies data augmentation for graphs. This is largely due to the complex, non-Euclidean structure of graphs, which limits possible manipulation operations. Augmentation operations commonly used in vision and language have no analogs for graphs. Our work studies graph data augmentation for graph neural networks (GNNs) in the context of improving semi-supervised node-classification. We discuss practical and theoretical motivations, considerations and strategies for graph data augmentation. Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction. Extensive experiments on multiple benchmarks show that augmentation via GAug improves performance across GNN architectures and datasets.

This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models. The study of artifacts that emerge in machine generated text as a result of modeling choices is a nascent research area. Previously, the extent and degree to which these artifacts surface in generated text has not been well studied. In the spirit of better understanding generative text models and their artifacts, we propose the new task of distinguishing which of several variants of a given model generated a piece of text, and we conduct an extensive suite of diagnostic tests to observe whether modeling choices (e.g., sampling methods, top-$k$ probabilities, model architectures, etc.) leave detectable artifacts in the text they generate. Our key finding, which is backed by a rigorous set of experiments, is that such artifacts are present and that different modeling choices can be inferred by observing the generated text alone. This suggests that neural text generators may be more sensitive to various modeling choices than previously thought.

Despite much success, deep learning generally does not perform well with small labeled training sets. In these scenarios, data augmentation has shown much promise in alleviating the need for more labeled data, but it so far has mostly been applied in supervised settings and achieved limited gains. In this work, we propose to apply data augmentation to unlabeled data in a semi-supervised learning setting. Our method, named Unsupervised Data Augmentation or UDA, encourages the model predictions to be consistent between an unlabeled example and an augmented unlabeled example. Unlike previous methods that use random noise such as Gaussian noise or dropout noise, UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods. This small twist leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small. For example, on the IMDb text classification dataset, with only 20 labeled examples, UDA achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On standard semi-supervised learning benchmarks CIFAR-10 and SVHN, UDA outperforms all previous approaches and achieves an error rate of 2.7% on CIFAR-10 with only 4,000 examples and an error rate of 2.85% on SVHN with only 250 examples, nearly matching the performance of models trained on the full sets which are one or two orders of magnitude larger. UDA also works well on large-scale datasets such as ImageNet. When trained with 10% of the labeled set, UDA improves the top-1/top-5 accuracy from 55.1/77.3% to 68.7/88.5%. For the full ImageNet with 1.3M extra unlabeled data, UDA further pushes the performance from 78.3/94.4% to 79.0/94.5%.

Two types of knowledge, factoid knowledge from graphs and non-factoid knowledge from unstructured documents, have been studied for knowledge aware open-domain conversation generation, in which edge information in graphs can help generalization of knowledge selectors, and text sentences of non-factoid knowledge can provide rich information for response generation. Fusion of knowledge triples and sentences might yield mutually reinforcing advantages for conversation generation, but there is less study on that. To address this challenge, we propose a knowledge aware chatting machine with three components, augmented knowledge graph containing both factoid and non-factoid knowledge, knowledge selector, and response generator. For knowledge selection on the graph, we formulate it as a problem of multi-hop graph reasoning that is more flexible in comparison with previous one-hop knowledge selection models. To fully leverage long text information that differentiates our graph from others, we improve a state of the art reasoning algorithm with machine reading comprehension technology. We demonstrate that supported by such unified knowledge and knowledge selection method, our system can generate more appropriate and informative responses than baselines.

Because of continuous advances in mathematical programing, Mix Integer Optimization has become a competitive vis-a-vis popular regularization method for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. We tackle these challenges, reducing computational burden when tuning the sparsity bound (a parameter which is critical for effectiveness) and improving performance in the presence of feature collinearity and of signals that vary in nature and strength. Importantly, we render the approach efficient and effective in applications of realistic size and complexity - without resorting to relaxations or heuristics in the optimization, or abandoning rigorous cross-validation tuning. Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.

Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We describe a general recipe for unsupervised meta-reinforcement learning, and describe an effective instantiation of this approach based on a recently proposed unsupervised exploration technique and model-agnostic meta-learning. We also discuss practical and conceptual considerations for developing unsupervised meta-learning methods. Our experimental results demonstrate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design, significantly exceeds the performance of learning from scratch, and even matches performance of meta-learning methods that use hand-specified task distributions.

The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

Afonso Fontes,Gregory Gay,Francisco Gomes de Oliveira Neto,Robert Feldt
0+阅读 · 10月26日
Frank Schneider,Felix Dangel,Philipp Hennig
0+阅读 · 10月26日
Xiwei Xuan,Xiaoyu Zhang,Oh-Hyun Kwon,Kwan-Liu Ma
0+阅读 · 10月25日
Tong Zhao,Yozen Liu,Leonardo Neves,Oliver Woodford,Meng Jiang,Neil Shah
27+阅读 · 2020年12月2日
Yi Tay,Dara Bahri,Che Zheng,Clifford Brunk,Donald Metzler,Andrew Tomkins
4+阅读 · 2020年4月13日
Qizhe Xie,Zihang Dai,Eduard Hovy,Minh-Thang Luong,Quoc V. Le
4+阅读 · 2019年7月10日
Zhibin Liu,Zheng-Yu Niu,Hua Wu,Haifeng Wang
4+阅读 · 2019年3月25日
Ana Kenney,Francesca Chiaromonte,Giovanni Felici
5+阅读 · 2018年8月7日
Abhishek Gupta,Benjamin Eysenbach,Chelsea Finn,Sergey Levine
6+阅读 · 2018年6月12日
Yann N. Dauphin,Angela Fan,Michael Auli,David Grangier
5+阅读 · 2017年9月8日

23+阅读 · 4月2日

60+阅读 · 2020年11月20日

22+阅读 · 2020年9月6日

20+阅读 · 2020年4月15日

91+阅读 · 2020年2月22日

41+阅读 · 2019年10月10日

54+阅读 · 2019年10月9日

CreateAMind
12+阅读 · 2019年5月22日

11+阅读 · 2019年5月6日
Call4Papers
5+阅读 · 2019年1月10日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
29+阅读 · 2019年1月3日
CreateAMind
7+阅读 · 2018年12月10日

64+阅读 · 2018年8月26日
CreateAMind
3+阅读 · 2018年4月15日

3+阅读 · 2017年8月6日
Top