List of accepted papers of the industry track:
FinExpert: Domain-Specific Test Generation for FinTech Systems
Tiancheng Jin, Qingshun Wang, Lihua Xu, Chunmei Pan, Liang Dou, Haifeng Qian, Liang He, and
Tao Xie
(East China Normal University, China; New York University Shanghai, China; CFETS Information Technology, China; University of Illinois at Urbana-Champaign, USA)
To assure high quality of software systems, the comprehensiveness of the created test suite and efficiency of the adopted testing process are highly crucial, especially in the FinTech industry, due to a FinTech system’s complicated system logic, mission-critical nature, and large test suite. However, the state of the testing practice in the FinTech industry still heavily relies on manual efforts. Our recent research efforts contributed our previous approach as the first attempt to automate the testing process in China Foreign Exchange Trade System (CFETS) Information Technology Co. Ltd., a subsidiary of China’s Central Bank that provides China’s foreign exchange transactions, and revealed that automating test generation for such complex trading platform could help alleviate some of these manual efforts. In this paper, we investigate further the dilemmas faced in testing the CFETS trading platform, identify the importance of domain knowledge in its testing process, and propose a new approach of domain-specific test generation to further improve the effectiveness and efficiency of our previous approach in industrial settings. We also present findings of our empirical studies of conducting domain-specific testing on subsystems of the CFETS Trading Platform.
Article Search
Design Diagrams as Ontological Source
Pranay Lohia, Kalapriya Kannan, Biplav Srivastava, and Sameep Mehta
(IBM Research, India; IBM Research, USA)
beginabstract In custom software development projects, it is frequently the case that the same type of software is being built for different customers. The deliverables are similar because they address the same market (e.g., Telecom, Banking) or have similar functions or both. However, most organisations do not take advantage of this similarity and conduct each project from scratch leading to lesser margins and lower quality. Our key observation is that the similarity among the projects alludes to the existence of a veritable domain of discourse whose ontology, if created, would make the similarity across the projects explicit. Design diagrams are an integral part of any commercial software project deliverables as they document crucial facets of the software solution. We propose an approach to extract ontological information from UML design diagrams (class and sequence diagrams) and represent it as domain ontology in a convenient representation. This ontology not only helps in developing a better understanding of the domain but also fosters software reuse for future software projects in that domain. Initial results on extracting ontology from thousands of model from public repository show that the created ontologies are accurate and help in better software reuse for new solutions. endabstract
Article Search
Predicting Pull Request Completion Time: A Case Study on Large Scale Cloud Services
Chandra Maddila, Chetan Bansal, and Nachiappan Nagappan
(Microsoft Research, USA)
Effort estimation models have been long studied in software engineering research. Effort estimation models help organizations and individuals plan and track progress of their software projects and individual tasks to help plan delivery milestones better. Towards this end, there is a large body of work that has been done on effort estimation for projects but little work on an individual checkin (Pull Request) level. In this paper we present a methodology that provides effort estimates on individual developer check-ins which is displayed to developers to help them track their work items. Given the cloud development infrastructure pervasive in companies, it has enabled us to deploy our Pull Request Lifetime prediction system to several thousand developers across multiple software families. We observe from our deployment that the pull request lifetime prediction system conservatively helps save 44.61% of the developer time by accelerating Pull Requests to completion.
Article Search
TERMINATOR: Better Automated UI Test Case Prioritization
Zhe Yu, Fahmid Fahid, Tim Menzies, Gregg Rothermel, Kyle Patrick, and Snehit Cherian
(North Carolina State University, USA; LexisNexis, USA)
Automated UI testing is an important component of the continuous integration process of software development. A modern web-based UI is an amalgam of reports from dozens of microservices written by multiple teams. Queries on a page that opens up another will fail if any of that page's microservices fails. As a result, the overall cost for automated UI testing is high since the UI elements cannot be tested in isolation. For example, the entire automated UI testing suite at LexisNexis takes around 30 hours (3-5 hours on the cloud) to execute, which slows down the continuous integration process.
To mitigate this problem and give developers faster feedback on their code, test case prioritization techniques are used to reorder the automated UI test cases so that more failures can be detected earlier. Given that much of the automated UI testing is "black box" in nature, very little information (only the test case descriptions and testing results) can be utilized to prioritize these automated UI test cases. Hence, this paper evaluates 17 "black box" test case prioritization approaches that do not rely on source code information. Among these, we propose a novel TCP approach, that dynamically re-prioritizes the test cases when new failures are detected, by applying and adapting a state of the art framework from the total recall problem. Experimental results on LexisNexis automated UI testing data show that our new approach (which we call TERMINATOR), outperformed prior state of the art approaches in terms of failure detection rates with negligible CPU overhead.
Article Search
Risks and Assets: A Qualitative Study of a Software Ecosystem in the Mining Industry
Thomas Olsson and Ulrik Franke
(RISE SICS, Sweden)
Digitalization and servitization are impacting many domains, including the mining industry. As the equipment becomes connected and technical infrastructure evolves, business models and risk management need to adapt.
In this paper, we present a study on how changes in asset and risk distribution are evolving for the actors in a software ecosystem (SECO) and system-of-systems (SoS) around a mining operation.
We have performed a survey to understand how Service Level Agreements (SLAs) — a common mechanism for managing risk — are used in other domains. Furthermore, we have performed a focus group study with companies.
There is an overall trend in the mining industry to move the investment cost (CAPEX) from the mining operator to the vendors. Hence, the mining operator instead leases the equipment (as operational expense, OPEX) or even acquires a service. This change in business model impacts operation, as knowledge is moved from the mining operator to the suppliers. Furthermore, as the infrastructure becomes more complex, this implies that the mining operator is more and more reliant on the suppliers for the operation and maintenance. As this change is still in an early stage, there is no formalized risk management, e.g. through SLAs, in place. Rather, at present, the companies in the ecosystem rely more on trust and the incentives created by the promise of mutual future benefits of innovation activities.
We believe there is a need to better understand how to manage risk in SECO as it is established and evolves. At the same time, in a SECO, the focus is on cooperation and innovation, the companies do not have incentives to address this unless there is an incident. Therefore, industry need, we believe, help in systematically understanding risk and defining quality aspects such as reliability and performance in the new business environment.
Article Search
Using Microservices for Non-intrusive Customization of Multi-tenant SaaS
Phu H. Nguyen, Hui Song, Franck Chauvel, Roy Muller, Seref Boyar, and Erik Levin
(SINTEF, Norway; Visma, Norway)
Enterprise software vendors often need to support their customer companies to customize the enterprise software products deployed on-premises of customers. But when software vendors are migrating their products to cloud-based Software-as-a-Service (SaaS), deep customization that used to be done on-premises is not applicable to the cloud-based multi-tenant context in which all tenants share the same SaaS. Enabling tenant-specific customization in cloud-based multi-tenant SaaS requires a novel approach. This paper proposes a Microservices-based non-intrusive Customization framework for multi-tenant Cloud-based SaaS, called MiSC-Cloud. Non-intrusive deep customization means that the microservices for customization of each tenant are isolated from the main software product and other microservices for customization of other tenants. MiSC-Cloud makes deep customization possible via authorized API calls through API gateways to the APIs of the customization microservices and the APIs of the main software product. We have implemented a proof-of-concept of our approach to enable non-intrusive deep customization of an open-source cloud native reference application of Microsoft called eShopOnContainers. Based on this work, we provide some lessons learned and directions for future work.
Article Search
Predicting Breakdowns in Cloud Services (with SPIKE)
Jianfeng Chen, Joymallya Chakraborty, Philip Clark, Kevin Haverlock, Snehit Cherian, and Tim Menzies
(North Carolina State University, USA; LexisNexis, USA)
Maintaining web-services is a mission-critical task where any down- time means loss of revenue and reputation (of being a reliable service provider). In the current competitive web services market, such a loss of reputation causes extensive loss of future revenue.
To address this issue, we developed SPIKE, a data mining tool which can predict upcoming service breakdowns, half an hour into the future. Such predictions let an organization alert and assemble the tiger team to address the problem (e.g. by reconguring cloud hardware in order to reduce the likelihood of that breakdown).
SPIKE utilizes (a) regression tree learning (with CART); (b) synthetic minority over-sampling (to handle how rare spikes are in our data); (c) hyperparameter optimization (to learn best settings for our local data) and (d) a technique we called “topology sampling” where training vectors are built from extensive details of an individual node plus summary details on all their neighbors.
In the experiments reported here, SPIKE predicted service spikes 30 minutes into future with recalls and precision of 75% and above. Also, SPIKE performed relatively better than other widely-used learning methods (neural nets, random forests, logistic regression).
Article Search
DeepDelta: Learning to Repair Compilation Errors
Ali Mesbah, Andrew Rice, Emily Johnston, Nick Glorioso, and Edward Aftandilian
(University of British Columbia, Canada; University of Cambridge, UK; Google, UK; Google, USA)
Programmers spend a substantial amount of time manually repairing code that does not compile. We observe that the repairs for any particular error class typically follow a pattern and are highly mechanical. We propose a novel approach that automatically learns these patterns with a deep neural network and suggests program repairs for the most costly classes of build-time compilation failures. We describe how we collect all build errors and the human-authored, in-progress code changes that cause those failing builds to transition to successful builds at Google. We generate an AST diff from the textual code changes and transform it into a domain-specific language called Delta that encodes the change that must be made to make the code compile. We then feed the compiler diagnostic information (as source) and the Delta changes that resolved the diagnostic (as target) into a Neural Machine Translation network for training. For the two most prevalent and costly classes of Java compilation errors, namely missing symbols and mismatched method signatures, our system called DeepDelta, generates the correct repair changes for 19,314 out of 38,788 (50%) of unseen compilation errors. The correct changes are in the top three suggested fixes 86% of the time on average.
Preprint
WhoDo: Automating Reviewer Suggestions at Scale
Sumit Asthana, Rahul Kumar, Ranjita Bhagwan, Christian Bird, Chetan Bansal, Chandra Maddila, Sonu Mehta, and B. Ashok
(Microsoft Research, India; Microsoft Research, USA)
Today's software development is distributed and involves continuous
changes for new features and yet, their development cycle has to be
fast and agile. An important component of enabling this agility is
selecting the right reviewers for every code-change – the smallest
unit of the development cycle. Modern tool-based code review is proven
to be an effective way to achieve appropriate code review of software
changes. However, the selection of reviewers in these code review
systems is at best manual. As software and teams scale, this poses the
challenge of selecting the right reviewers, which in turn determines
software quality over time. While previous work has suggested
automatic approaches to code reviewer recommendations, it has been
limited to retrospective analysis. We not only deploy a reviewer
suggestions algorithm – WhoDo – and evaluate its effect but
also incorporate load balancing as part of it to address one of its
major shortcomings: of recommending experienced developers very
frequently. We evaluate the effect of this hybrid recommendation +
load balancing system on five repositories within Microsoft. Our
results are based around various aspects of a commit and how code
review affects that. We attempt to quantitatively answer questions
which are supposed to play a vital role in effective code review
through our data and substantiate it through qualitative feedback of
partner repositories.
Article Search
An IR-Based Approach towards Automated Integration of Geo-Spatial Datasets in Map-Based Software Systems
Nima Miryeganeh, Mehdi Amoui, and Hadi Hemmati
(University of Calgary, Canada; Localintel, Canada)
Data is arguably the most valuable asset of the modern world. In this era, the success of any data-intensive solution relies on the quality of data that drives it. Among vast amount of data that are captured, managed, and analyzed everyday, geospatial data are one of the most interesting class of data that hold geographical information of real-world phenomena and can be visualized as digital maps. Geo-spatial data is the source of many enterprise solutions that provide local information and insights. Companies often aggregate geospacial datasets from various sources in order to increase the quality of such solutions. However, a lack of a global standard model for geospatial datasets makes the task of merging and integrating datasets difficult and error prone. Traditionally, this aggregation was accomplished by domain experts manually validating the data integration process by merging new data sources and/or new versions of previous data against conflicts and other requirement violations. However, this manual approach is not scalable is a hinder toward rapid release when dealing with big datasets which change frequently. Thus more automated approaches with limited interaction with domain experts is required. As a first step to tackle this problem, we have leveraged Information Retrieval (IR) and geospatial search techniques to propose a systematic and automated conflict identification approach. To evaluate our approach, we conduct a case study in which we measure the accuracy of our approach in several real-world scenarios and followed by interviews with Localintel Inc. software developers to get their feedbacks.
Article Search
Code Coverage at Google
Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser
(Google, Switzerland; University of Washington, USA; University of Passau, Germany)
Code coverage is a measure of the degree to which a test suite exercises a software system. Although coverage is well established in software engineering research, deployment in industry is often inhibited by the perceived usefulness and the computational costs of analyzing coverage at scale. At Google, coverage information is computed for one billion lines of code daily, for seven programming languages. A key aspect of making coverage information actionable is to apply it at the level of changesets and code review.
This paper describes Google’s code coverage infrastructure and how the computed code coverage information is visualized and used. It also describes the challenges and solutions for adopting code coverage at scale. To study how code coverage is adopted and perceived by developers, this paper analyzes adoption rates, error rates, and average code coverage ratios over a five-year period, and it reports on 512 responses, received from surveying 3000 developers. Finally, this paper provides concrete suggestions for how to implement and use code coverage in an industrial setting.
Article Search
When Deep Learning Met Code Search
Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra
(Massachusetts Institute of Technology, USA; Facebook, USA; University of California at Berkeley, USA)
There have been multiple recent proposals on using deep neural networks for code search using natural language. Common across these proposals is the idea of embedding code and natural language queries into real vectors and then using vector distance to approximate semantic correlation between code and the query. Multiple approaches exist for learning these embeddings, including unsupervised techniques, which rely only on a corpus of code examples, and supervised techniques, which use an aligned corpus of paired code and natural language descriptions. The goal of this supervision is to produce embeddings that are more similar for a query and the corresponding desired code snippet.
Clearly, there are choices in whether to use supervised techniques at all, and if one does, what sort of network and training to use for supervision. This paper is the first to evaluate these choices systematically. To this end, we assembled implementations of state-of-the-art techniques to run on a common platform, training and evaluation corpora. To explore the design space in network complexity, we also introduced a new design point that is a minimal supervision extension to an existing unsupervised technique.
Our evaluation shows that: 1. adding supervision to an existing unsupervised technique can improve performance, though not necessarily by much; 2. simple networks for supervision can be more effective that more sophisticated sequence-based networks for code search; 3. while it is common to use docstrings to carry out supervision, there is a sizeable gap between the effectiveness of docstrings and a more query-appropriate supervision corpus.
Article Search
FUDGE: Fuzz Driver Generation at Scale
Domagoj Babić, Stefan Bucur, Yaohui Chen, Franjo Ivančić, Tim King, Markus Kusano, Caroline Lemieux, László Szekeres, and Wei Wang
(Google, USA; Northeastern University, USA; University of California at Berkeley, USA)
At Google we have found tens of thousands of security and robustness bugs by fuzzing C and C++ libraries. To fuzz a library, a fuzzer requires a fuzz driver—which exercises some library code—to which it can pass inputs. Unfortunately, writing fuzz drivers remains a primarily manual exercise, a major hindrance to the widespread adoption of fuzzing. In this paper, we address this major hindrance by introducing the Fudge system for automated fuzz driver generation. Fudge automatically generates fuzz driver candidates for libraries based on existing client code. We have used Fudge to generate thousands of new drivers for a wide variety of libraries. Each generated driver includes a synthesized C/C++ program and a corresponding build script, and is automatically analyzed for quality. Developers have integrated over 200 of these generated drivers into continuous fuzzing services and have committed to address reported security bugs. Further, several of these fuzz drivers have been upstreamed to open source projects and integrated into the OSS-Fuzz fuzzing infrastructure. Running these fuzz drivers has resulted in over 150 bug fixes, including the elimination of numerous exploitable security vulnerabilities.
Preprint
Industry Practice of Coverage-Guided Enterprise Linux Kernel Fuzzing
Heyuan Shi, Runzhe Wang, Ying Fu, Mingzhe Wang, Xiaohai Shi, Xun Jiao, Houbing Song,
Yu Jiang, and Jiaguang Sun
(Tsinghua University, China; Alibaba Group, China; Villanova University, USA; Embry-Riddle Aeronautical University, USA)
Coverage-guided kernel fuzzing is a widely-used technique that
has helped kernel developers and testers discover numerous vulnerabilities.
However, due to the high complexity of application and
hardware environment, there is little study on deploying fuzzing to
the enterprise-level Linux kernel. In this paper, collaborating with
the enterprise developers, we present the industry practice to deploy
kernel fuzzing on four different enterprise Linux distributions
that are responsible for internal business and external services of
the company. We have addressed the following outstanding challenges
when deploying a popular kernel fuzzer, syzkaller, to these
enterprise Linux distributions: coverage support absence, kernel
configuration inconsistency, bugs in shallow paths, and continuous
fuzzing complexity. This leads to a vulnerability detection of 41
reproducible bugs which are previous unknown in these enterprise
Linux kernel and 6 bugs with CVE IDs in U.S. National Vulnerability
Database, including flaws that cause general protection fault,
deadlock, and use-after-free.
Article Search
Architectural Decision Forces at Work: Experiences in an Industrial Consultancy Setting
Julius Rueckert, Andreas Burger, Heiko Koziolek, Thanikesavan Sivanthi, Alexandru Moga, and Carsten Franke
(ABB Corporate Research, Germany; ABB Corporate Research, Switzerland)
The concepts of decision forces and the decision forces viewpoint were proposed to help software architects to make architectural decisions more transparent and the documentation of their rationales more explicit. However, practical experience reports and guidelines on how to use the viewpoint in typical industrial project setups are not available. Existing works mainly focus on basic tool support for the documentation of the viewpoint or show how forces can be used as part of focused architecture review sessions. With this paper, we share experiences and lessons learned from applying the decision forces viewpoint in a distributed industrial project setup, which involves consultants supporting architects during the re-design process of an existing large software system. Alongside our findings, we describe new forces that can serve as template for similar projects, discuss challenges applying them in a distributed consultancy project, and share ideas for potential extensions.
Article Search
The Role of Limitations and SLAs in the API Industry
Antonio Gamez-Diaz, Pablo Fernandez, Antonio Ruiz-Cortés, Pedro J. Molina, Nikhil Kolekar, Prithpal Bhogill, Madhurranjan Mohaan, and Francisco Méndez
(University of Seville, Spain; Metadev, Spain; PayPal, USA; Google, USA; AsyncAPI Initiative, Spain)
As software architecture design is evolving to a microservice paradigm, RESTful APIs are being established as the preferred choice to build applications. In such a scenario, there is a shift towards a growing market of APIs where providers offer different service levels with tailored limitations typically based on the cost.
In this context, while there are well established standards to describe the functional elements of APIs (such as the OpenAPI Specification), having a standard model for Service Level Agreements (SLAs) for APIs may boost an open ecosystem of tools that would represent an improvement for the industry by automating certain tasks during the development such as: SLA-aware scaffolding, SLA-aware testing, or SLA-aware requesters.
Unfortunately, despite there have been several proposals to describe SLAs for software in general and web services in particular during the past decades, there is an actual lack of a widely used standard due to the complex landscape of concepts surrounding the notion of SLAs and the multiple perspectives that can be addressed.
In this paper, we aim to analyze the landscape for SLAs for APIs in two different directions: i) Clarifying the SLA-driven API development lifecycle: its activities and participants; 2) Developing a catalog of relevant concepts and an ulterior prioritization based on different perspectives from both Industry and Academia.
As a main result, we present a scored list of concepts that paves the way to establish a concrete road-map for a standard industry-aligned specification to describe SLAs in APIs.
Article Search
Info
Evaluating Model Testing and Model Checking for Finding Requirements Violations in Simulink Models
Shiva Nejati, Khouloud Gaaloul, Claudio Menghi, Lionel C. Briand, Stephen Foster, and David Wolfe
(University of Luxembourg, Luxembourg; QRA, Canada)
Matlab/Simulink is a development and simulation language that is widely used by the Cyber-Physical System (CPS) industry to model dynamical systems. There are two mainstream approaches to verify CPS Simulink models: model testing that attempts to identify failures in models by executing them for a number of sampled test inputs, and model checking that attempts to exhaustively check the correctness of models against some given formal properties. In this paper, we present an industrial Simulink model benchmark, provide a categorization of different model types in the benchmark, describe the recurring logical patterns in the model requirements, and discuss the results of applying model checking and model testing approaches to identify requirements violations in the benchmarked models. Based on the results, we discuss the strengths and weaknesses of model testing and model checking. Our results further suggest that model checking and model testing are complementary and by combining them, we can significantly enhance the capabilities of each of these approaches individually. We conclude by providing guidelines as to how the two approaches can be best applied together.
Article Search
Model Checking a C++ Software Framework: A Case Study
John Lång and
I. S. W. B. Prasetya
(University of Helsinki, Finland; Utrecht University, Netherlands)
This paper presents a case study on applying two model checkers, Spin and Divine, to verify key properties of a C++ software framework, known as ADAPRO, originally developed at CERN. Spin was used for verifying properties on the design level. Divine was used for verifying simple test applications that interacted with the implementation. Both model checkers were found to have their own respective sets of pros and cons, but the overall experience was positive. Because both model checkers were used in a complementary manner, they provided valuable new insights into the framework, which would arguably have been hard to gain by traditional testing and analysis tools only. Translating the C++ source code into the modeling language of the Spin model checker helped to find flaws in the original design. With Divine, defects were found in parts of the code base that had already been subject to hundreds of hours of unit tests, integration tests, and acceptance tests. Most importantly, model checking was found to be easy to integrate into the workflow of the software project and bring added value, not only as verification, but also validation methodology. Therefore, using model checking for developing library-level code seems realistic and worth the effort.
Preprint
Info
Evolving with Patterns: A 31-Month Startup Experience Report
Miguel Ehécatl Morales-Trujillo and Gabriel Alberto García-Mireles
(University of Canterbury, New Zealand; Universidad de Sonora, Mexico)
Software startups develop innovative products under extreme conditions of uncertainty. At the same time they represent a fast-growing sector in the economy and scale up research and technological advancement. This paper describes findings after observing a startup during its first 31 months of life. The data was collected through observations, unstructured interviews as well as from technical and managerial documentation of the startup. The findings are based on a deductive analysis and summarized in 24 contextualized patterns that concern communication, interaction with customer, teamwork, and management. Furthermore, 13 lessons learned are presented with the aim of sharing experience with other startups. This industry report contributes to understanding the applicability and usefulness of startups' patterns, providing valuable knowledge for the startup software engineering community.
Article Search
Bridging the Gap between ML Solutions and Their Business Requirements using Feature Interactions
Guy Barash, Eitan Farchi, Ilan Jayaraman, Orna Raz,
Rachel Tzoref-Brill, and Marcel Zalmanovici
(Western Digital, Israel; IBM Research, Israel; IBM, India)
Machine Learning (ML) based solutions are becoming increasingly popular and pervasive. When testing such solutions, there is a tendency to focus on improving the ML metrics such as the F1-score and accuracy at the expense of ensuring business value and correctness by covering business requirements. In this work, we adapt test planning methods of classical software to ML solutions. We use combinatorial modeling methodology to define the space of business requirements and map it to the ML solution data, and use the notion of data slices to identify the weaker areas of the ML solution and strengthen them. We apply our approach to three real-world case studies and demonstrate its value.
Article Search
Design Thinking in Practice: Understanding Manifestations of Design Thinking in Software Engineering
Franziska Dobrigkeit and Danielly de Paula
(HPI, Germany; National University of Ireland at Galway, Ireland)
This industry case study explores where and how Design Thinking
supports software development teams in their endeavour to create
innovative software solutions. Design Thinking has found its
way into software companies ranging from startups to SMEs and
multinationals. It is mostly seen as a human centered innovation
approach or a way to elicit requirements in a more agile fashion.
However, research in Design Thinking suggests that being exposed
to DT changes the mindset of employees. Thus this article aims to
explore the wider use of DT within software companies through
a case study in a multinational organization. Our results indicate,
that once trained in DT, employees find various ways to implement
it not only as a pre-phase to software development but throughout
their projects even applying it to aspects of their surroundings such
as the development process, team spaces and team work. Specifically
we present a model of how DT manifests itself in a software
development company.
Article Search