My research addresses a serious problem in software engineering: defects made during software development. My research efforts have resulted in the development and validation of a broad range of interventions (e.g., changes in task design; process redesign; training; tools and other assists) developers can use to find and prevent software faults and errors, which increases quality and reduce development costs, both vital goals in a competitive and fast-moving industry. My research has an empirical and multidisciplinary underpinning:
Empirical Software Engineering (ESE): My approach to ESE involves conducting experiments and gathering evidence to make informed choices about tools and techniques that are appropriate for use. I have experience designing and executing experiments in variety of settings.
Multidisciplinary: My research capitalizes on a broad range of expertise drawn from different disciplines (Medicine, Aviation, Wildlife, Psychology etc.) and adapts them for the task of improving software quality and supporting pedagogy in CS classrooms.
My most recent research efforts can be organized in 4 broad themes and briefly described below:
1) Integrating Software Engineering and Psychology research,
2) Using Probabilistic and statistical modeling for defect detection/estimation;
3) Using Machine Learning to Improve the quality of software documents; and
4) CS and SE pedagogy.
1.Integrating Software Engineering (SE) and Psychology Research: My research leverages research from Cognitive, Social and Behavioral Psychology fields to address a diverse set of SE research problems. Some of them are discussed below: a.Integrating SE and Cognitive Error Models to Improve Software Quality:
Problem Statement – Correction of faults in software projects incurs considerable costs for software organizations, especially if fault detection occurs late in the development process. Various approaches (RCA, ODC, 5 whys etc.) have been introduced that seeks to identify each fault’s point-of-origin so that appropriate procedural changes can be made. In practice, retrospective fault-causal analysis suffers from inter-analyst reliability issues, fails to distinguish between causal and situational factors, and lacks a comprehensive theory of fault production that could help generalize and extend the prevention and mitigation efforts to other related issues.
Solution – Faults do not arise out of thin air, rather they result from problematic activities occurring prior to and during the creation of the software artifact in which the fault gets recorded (especially in the requirements phase). Psychologists refer to these antecedent problems as human errors, which are failings of human cognition while problem solving, planning, or execution. My work utilizes human error research to add structure and provide a solid theoretical framework to support the use of error information in improving software quality and to provide insights to the cognitive aspects of software development.
Outcomes – This work is being supported by collaborative NSF CCF awards (with Dr. Jeffrey Carver at UA) and used human error theories (in consultation with Dr. Gary Bradshaw - Cognitive Psychology expert) to develop the Human Error Taxonomy (HET) of requirements engineering (RE) errors. The usefulness of the HET has been validated through a number of studies at U.S. institutions [UA, MSU, NDSU] and working professionals [Microsoft, CAPS, HCL]. A workshop was conducted at the ICSE’15 (that included SE and Psychology experts) to begin the collection and discussion of human errors in the SE community, and a workshop at REFSQ’17 later disseminated results to researchers and practitioners. Other researchers have used the HET to characterize the incidence of human errors in different domains (e.g., space software in Brazil). The results from industrial studies were presented in the Best Industry Paper session at ISSRE and the work on human error curricula initiative won the Exemplary CS Education Research Paper at SIGCSE 2017.
b.Using Cognitive Learning Patterns to Guide the Selection of Skilled Inspectors in Software Industry:
Problem Statement – Software organizations employ inspections (peer-reviews) to find and fix faults early to improve quality and avoid rework costs. The first step is an inspection starts with inspection leader selecting a team of skilled individuals who will perform the inspection. While, overall inspection effectiveness relies on individual’s ability (which varies significantly) to detect faults; results at major US companies (e.g., Microsoft) showed that individual factors (e.g., educational background or task experience) cannot be relied upon to predict an inspectors’ effectiveness (i.e., fault count) or efficiency (fault rate). Project managers need objective information on how to select a group of skilled inspectors in order to maximize the effectiveness of an inspection process.
Solution – Cognitive learning styles of an individual describes the way they think, perceive/process information, remember and solve problems. Based on the mind styles models developed by psychologists, my research evaluated the perceptual (abstract/concrete) and ordering (sequential/random) abilities of inspectors. The concept of learning theory is further supplanted through the usage of Eye Tracking that analyzed the eye-movement patterns (e.g., fixations, dwell, saccades) to characterize and guide the selection of inspectors.
Outcomes – This work is being supported by NDSU Center of Visual and Cognitive Neuroscience (NIH COBRE grant) Lab in collaboration with psychology experts at CVCN. Guided by multivariate analysis techniques (e.g., PCA, Clustering), this work resulted in the development of a tool (that takes input the cognitive learning styles) to assist project managers in selecting inspectors that would enable maximum fault coverage. Early research results on the individual characteristics of skilled inspectors in controlled settings at NDSU (in the context of requirements inspection) received the Best Research Paper Award. The replication of research design with working professionals (in U.S. and India) received the Best Industry Paper Award nomination at ISSRE’2016. The result from eye tracking studies presented at ESEM’16 conference provided useful insights into the reading patterns (e.g., low frequency of fixations, large fixation time, size of the saccades) of effective inspectors. More recently, the application of unsupervised machine learning approach was able to predict the abilities of inspectors based on their eye movement and cognitive learning styles. Our latest results (Best Disruptive Paper Award nominee) showed that inspectors’ cognitive reading patterns (fixations, saccades and timing data) can be utilized to select most effective inspectors for detecting different fault-types (e.g., missing information, incorrect requirements). In particular, Ensemble based learning method (with Random Forest) was able to accurately (up to 94%) predict effective inspectors.
c.Development of Behavior Marker (BM) Systems for measuring and evaluating the non-technical skills of Software Engineers:
Problem– Non-technical skills (e.g., teamwork, listening, critical thinking, leadership, integrity) are the social and cognitive skills which compliment software professionals’ technical skills. The management of software developer’s NT skills is particularly important for today’s teams because more and more organizations are using Agile methodologies which rely less on documentation and more on people. There are no proven tools to effectively measure the NT skills of software developers. Professional software organizations feel that these skills need to be tracked and feedback provided so that software development team project members can improve.
Solution – My research leveraged the work done in aviation (training and assessing flight crews) and other industries (healthcare, maritime) that have been using behavioral marker (BM) systems to structure individual and team assessments of these NT skills. BMs are specific and observable NT behaviors (not personality traits) that contribute to superior or substandard performance within a work environment. They are derived by analyzing data (rather than gut feelings) regarding performance that contributes to successful and unsuccessful outcomes and use a taxonomy or listing of NT skills associated with effective job performance.
Outcomes – My research (in collaboration with my former PhD student, University of Helsinki and supported by Finnish Grant) resulted in the development and validation of the Non-Technical Skills Assessment tool for Software Development Teams (NTSA). NTSA was developed via literature search, focus group sessions with industry and in consultation with former director of the Project Management Institute (www.pmi.org). NTSA is structured into BM audit tool (includes good and bad behavior examples of each NT skill) and is designed to be used by an observer (i.e. manager, team leader, coach) during routine team interactions or meetings. NTSA was validated at University of Helsinki’s Software Factory (live software development lab in Finland). Raters of NTSA tool independently analyzed video footage of projects to evaluate members NT skills. Based on the results of inter-rater reliability, NTSA can be used reliably (with minimal training) by managers to identify areas in which the team’s NT skills could use some improvements.
2.Probabilistic Sampling and Statistical Modeling: Another aspect of my research uses probabilistic sampling (e.g., curve-fitting) and analyzing models (capture-recapture)/metrics (inspection cost metrics) to support software defect size and effort estimations.
Problem – While inspections are effective, they only certify the presence of defects and do not tell us how many additional defects may still be present post-inspection. Additionally, while adding inspectors enable larger defect coverage, there is not enough empirical evidence (in real-time) on rework cost-savings achieved by inspections of early work products (vs. testing during the later stage).
Solution – My research adapted Capture-Recapture (CR) models (originally developed by Biologists to estimate wildlife abundance in a closed population) to support the defect size estimates of software artifacts. Through modeling of inspectors (individual) and defect heterogeneity in capture probabilities, CR can estimate the number of defects in the artifact. The difference between the estimated number of defects and the number of defects actually found provides an estimate of how many remain. Additionally, using the concept of virtual testing cost (costs spent in the absence of inspections), economics of defect detection and inspection team size can be analyzed
Outcomes – I have performed several empirical studies to evaluate the use of the CR method in academic and industrial settings. Using data from 73 inspectors at Microsoft, the results provide information regarding the minimum number of inspectors, specific probabilistic models, and trade-offs between inspection costs vs. testing rework savings. This work has been published in several international conferences (ICSE – the top rated SE conference, ICST, ISSRE and ESEM) and won “Best Industry Paper” at an international software reliability conference.
3.Using NLP and ML to Improve the Quality of Software Artifacts: My recent research efforts include the application of Natural Language Processing (NLP) and Machine Learning (ML) approaches to automate the review validation and defect fixation processes.
Problem– Software artifacts often need domain expert reviewer(s) to find faults recorded in the document. The author of an artifact has to manually validate each fault (reported during the review) to remove fault-positives and then identify which areas need revision. This is tedious, time consuming and error-prone step (e.g., fault fixation introduces new faults). Can we automate the fault validation and fixation steps? More importantly, can we build models using developers’ and inspectors’ past fault data to predict future behavior?
Solution – To automates the fault validation step, ML approaches (e.g., supervised learning) can help categorize useful vs. non-useful reviews. NLP (e.g., POS tags) and graph mining (e.g., clique mining) can identify the related requirements to support the fault fixation process.
Outcomes – This research reported that ensemble of classifiers when used with POS tagging (Nouns and Adjectives) can help filter false positive (non-useful) reviews but the classification accuracy is only 55-65% (a large classification error rate). By using priority class (confidence value) for each review and collective POS tags, the accuracy was improved to 90%. This research also validated different classifiers for validating reviews belonging to different fault types - Ambiguity (A), Omission (O), Incorrect Fact (IF), Inconsistent Information (II), Extraneous (E) and Miscellaneous (M). This work has been published in ML and RE venues.
4.Discipline Based (CS and SE) Educational Research: One of the most rewarding aspects of my research has been cross-fertilization of research and pedagogy. My efforts into CS/SE Ed research are based on learning theories that are backed with strong empirical evidence and have been funded by NSF IUSE and TUES programs. The results have been published in the TOCE, CoED, SIGCSE, ASEE, FECS. CSEET and CrossTalk. I am honored to be a member of DBER faculty cohort at NDSU (and advising 4 STEM PhD students in SE). A summary of my major educational research efforts is listed below: a)Improving Software Testing Skills: Based on the observations at NDSU, students do not obtain sufﬁcient testing knowledge/skills while completing undergraduate CS degree which causes problems for the students during their education and in the workplace. I have worked with CS/IT instructors to try to address this problem and to better support testing pedagogy in CS classrooms:
Web-based Repository of Software Testing Tutorials (WReSTT): As part of NSF funded TUES 2 collaborative project, WReSTT was developed to facilitate the improvement of students’ conceptual and practical understanding of software testing through integration into CS/IT programming courses. We developed vetted learning materials (e.g., testing tools and tutorials); developed faculty expertise (via NSF sponsored faculty workshops) to support testing pedagogy; and have impacted over 1000 students from 30 institutions.
Testing Tutor Pedagogy: Unlike other testing tools that provide raw coverage information, Testing Tutor pedagogy provides feedback at a conceptual level (e.g., boundary value conditions) and systematically help students to improve their understanding of fundamental testing concepts (through learning repository in WReSTT) to fully test their software.
b)Mental Model (MM) of Computer Programmers: Introductory CS Programming courses at NDSU have a large incidence of drop outs and DFW’s. My research noted that the incoming CS1/CS2 students often construct non-viable mental models (a flawed understanding of a particular concept) when trying to understand programming concepts such as variable assignment and recursion. At NDSU, we developed and evaluated the student’s Mental Model-MM test to gain insights into how CS students approach problem solving. This research led to two major outcomes: 1) use of MM-test to identify students with inconsistent mental models; 2) validated the use of pair programming as an effective method of migrating students towards greater MM consistency. c)Learning Engagement Strategies (LESs) to Improve Software Development Pedagogy: NSF funded our IUSE: Level 2: Design and Development Tier project (collaborative grant of $1.84M) that provide an adaptive cyberlearning environment for software and programming courses (SEP-CyLE) based on a theoretical framework for LESs. These strategies include collaborative learning, problem-based learning, gamification, and social interaction. An IUSE-Phase 3 ($3M proposal for Institutional and Community Change track) is under review to extend and integrate SEP-CyLE with virtual problem-based environments and gamification supporting applications. d)Improving Capstone Project Experiences: At NDSU, I am working with NDSU capstone instructors to evaluate (through feedback from students, sponsors, and interviews with industrial partners) the process meeting CMMI maturity level 2 (the first improvement level where processes are characterized depending on the projects and includes project management and extending it to incorporate agile development methodologies. A large part of my involvement is to evaluate the CMMI process and use the feedback from the sponsors, instructors and students to improve the model and corresponding training materials. Specifically, my research focused on identifying and evaluating the Knowledge and Skill-Deficiencies among graduating CS students when first beginning a job in industry, but are expected to have by employers. The published findings based on interviews with 23 managers and hiring personnel (who sponsor our Capstone projects) at different companies in the software development industry (USA and Europe) highlight the struggles that recent graduates face when first starting at those companies. Our results provide detailed descriptions about these different areas along with recommendations for educators, industry managers, and recent graduates.