Publications | An Ran Chen

2024

FSE

Towards Better Graph Neural Neural Network-based Fault Localization Through Enhanced Code Representation

Md Nakhla Rafi, Dong Jae Kim, An Ran Chen, Tse-Hsun Chen, and Shaowei Wang

arXiv preprint arXiv:2404.04496, 2024

Abs PDF

Automatic software fault localization plays an important role in software quality assurance by pinpointing faulty locations for easier debugging. Coverage-based fault localization is a commonly used technique, which applies statistics on coverage spectra to rank faulty code based on suspiciousness scores. However, statistics-based approaches based on formulae are often rigid, which calls for learning-based techniques. Amongst all, Grace, a graph-neural network (GNN) based technique has achieved state-of-the-art due to its capacity to preserve coverage spectra, i.e., test-to-source coverage relationships, as precise abstract syntax-enhanced graph representation, mitigating the limitation of other learning-based technique which compresses the feature representation. However, such representation is not scalable due to the increasing complexity of software, correlating with increasing coverage spectra and AST graph, making it challenging to extend, let alone train the graph neural network in practice. In this work, we proposed a new graph representation, DepGraph, that reduces the complexity of the graph representation by 70% in nodes and edges by integrating interprocedural call graph in the graph representation of the code. Moreover, we integrate additional features – code change information – in the graph as attributes so the model can leverage rich historical project data. We evaluate DepGraph using Defects4j 2.0.0, and it outperforms Grace by locating 20% more faults in Top-1 and improving the Mean First Rank (MFR) and the Mean Average Rank (MAR) by over 50% while decreasing GPU memory usage by 44% and training/inference time by 85%. Additionally, in cross-project settings, DepGraph surpasses the state-of-the-art baseline with a 42% higher Top-1 accuracy, and 68% and 65% improvement in MFR and MAR, respectively. Our study demonstrates DepGraph’s robustness, achieving state-of-the-art accuracy and scalability for future extension and adoption.
ICSE

LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing

Zeyang Ma, An Ran Chen, Dong Jae Kim, Tse-Hsun Chen, and Shaowei Wang

In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024

Abs PDF

Logs are important in modern software development with runtime information. Log parsing is the first step in many log-based analyses, that involve extracting structured information from unstructured log data. Traditional log parsers face challenges in accurately parsing logs due to the diversity of log formats, which directly impacts the performance of downstream log-analysis tasks. In this paper, we explore the potential of using Large Language Models (LLMs) for log parsing and propose LLMParser, an LLM-based log parser based on generative LLMs and few-shot tuning. We leverage four LLMs, Flan-T5-small, Flan-T5-base, LLaMA-7B, and ChatGLM-6B in LLMParsers. Our evaluation of 16 open-source systems shows that LLMParser achieves statistically significantly higher parsing accuracy than state-of-the-art parsers (a 96% average parsing accuracy). We further conduct a comprehensive empirical analysis on the effect of training size, model size, and pre-training LLM on log parsing accuracy. We find that smaller LLMs may be more effective than more complex LLMs; for instance where Flan-T5-base achieves comparable results as LLaMA-7B with a shorter inference time. We also find that using LLMs pre-trained using logs from other systems does not always improve parsing accuracy. While using pre-trained Flan-T5-base shows an improvement in accuracy, pre-trained LLaMA results in a decrease (decrease by almost 55% in group accuracy). In short, our study provides empirical evidence for using LLMs for log parsing and highlights the limitations and future research direction of LLM-based log parsers.

2023

ASE

Are They All Good? Studying Practitioners’ Expectations on the Readability of Log Messages

Zhenhao Li, An Ran Chen, Xing Hu, Xin Xia, Tse-Hsun (Peter) Chen, and Weiyi Shang

In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, 2023

Abs PDF

Developers write logging statements to generate logs that provide run-time information for various tasks. The readability of log messages in the logging statements (i.e., the descriptive text) is rather crucial to the value of the generated logs. Immature log messages may slow down or even obstruct the process of log analysis. Despite the importance of log messages, there is still a lack of standards on what constitutes good readability of log messages and how to write them. In this paper, we conduct a series of interviews with 17 industrial practitioners to investigate their expectations on the readability of log messages. Through the interviews, we derive three aspects related to the readability of log messages, including Structure, Information, and Wording, along with several specific practices to improve each aspect. We validate our findings through a series of online questionnaire surveys and receive positive feedback from the participants. We then manually investigate the readability of log messages in large-scale open source systems and find that a large portion (38.1%) of the log messages have inadequate readability. Motivated by such observation, we further explore the potential of automatically classifying the readability of log messages using deep learning and machine learning models. We find that both deep learning and machine learning models can effectively classify the readability of log messages with a balanced accuracy above 80.0% on average. Our study provides comprehensive guidelines for composing log messages to further improve practitioners’ logging practices.

2022

TSE

T-Evos: A Large-Scale Longitudinal Study on CI Test Execution and Failure

An Ran Chen, Tse-Hsun Peter Chen, and Shaowei Wang

IEEE Transactions on Software Engineering, 2022

Abs HTML PDF

Continuous integration is widely adopted in software projects to reduce the time it takes to deliver the changes to the market. To ensure software quality, developers also run regression test cases in a continuous fashion. The CI practice generates commit-by-commit software evolution data that provides great opportunities for future testing research. However, such data is often unavailable due to space limitation (e.g., developers only keep the data for a certain period) and the significant effort involved in re-running the test cases on a per-commit basis. In this paper, we present T-Evos, a dataset on test result and coverage evolution, covering 8,093 commits across 12 open-source Java projects. Our dataset includes the evolution of statement-level code coverage for every test case (either passed and failed), test result, all the builds information, code changes, and the corresponding bug reports. We conduct an initial analysis to demonstrate the overall dataset. In addition, we conduct an empirical study using T-Evos to study the characteristics of test failures in CI settings. We find that test failures are frequent, and while most failures are resolved within a day, some failures require several weeks to resolve. We highlight the relationship between code changes and test failure, and provide insights for future automated testing research. Our dataset may be used for future testing research and benchmarking in CI. Our findings provide an important first step in understanding code coverage evolution and test failures in a continuous environment.
ASE

How Useful is Code Change Information for Fault Localization in Continuous Integration?

An Ran Chen, Tse-Hsun Chen, and Junjie Chen

In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022

Abs HTML PDF

Continuous integration (CI) is the process in which code changes are automatically integrated, built, and tested in a shared repository. In CI, developers frequently merge and test code under development, which helps isolate faults with finer-grained change information. To identify faulty code, prior research has widely studied and evaluated the performance of spectrum-based fault localization (SBFL) techniques. While the continuous nature of CI requires the code changes to be atomic and presents fine-grained information on what part of the system is being changed, traditional SBFL techniques do not benefit from it. To overcome the limitation, we propose to integrate the code and coverage change information in fault localization under CI settings. First, code changes show how faults are introduced into the system, and provide developers with better understanding on the root cause. Second, coverage changes show how the code coverage is impacted when faults are introduced. This change information can help limit the search space of code coverage, which offers more opportunities for improving fault localization techniques. Based on the above observations, we propose three new change-based fault localization techniques, and compare them with Ochiai, a commonly used SBFL technique. We evaluate these techniques on 192 real faults from seven software systems. Our results show that all three change-based techniques outperform Ochiai on the Defects4J dataset. In particular, the improvement varies from 7% to 23% and 17% to 24% for average MAP and MRR, respectively. Moreover, we find that our change-based fault localization techniques can be integrated with Ochiai, and boost its performance by up to 53% and 52% for average MAP and MRR, respectively.

2021

TSE

Pathidea: Improving information retrieval-based bug localization by re-constructing execution paths using logs

An Ran Chen, Tse-Hsun Chen, and Shaowei Wang

IEEE Transactions on Software Engineering, 2021

Abs HTML PDF

To assist developers with debugging and analyzing bug reports, researchers have proposed information retrieval-based bug localization (IRBL) approaches. IRBL approaches leverage the textual information in bug reports as queries to generate a ranked list of potential buggy files that may need further investigation. Although IRBL approaches have shown promising results, most prior research only leverages the textual information that is “visible” in bug reports, such as bug description or title. However, in addition to the textual description of the bug, developers also often attach logs in bug reports. Logs provide important information that can be used to re-construct the system execution paths when an issue happens and assist developers with debugging. In this paper, we propose an IRBL approach, Pathidea, which leverages logs in bug reports to re-construct execution paths and helps improve the results of bug localization. Pathidea uses static analysis to create a file-level call graph, and re-constructs the call paths from the reported logs. We evaluate Pathidea on eight open source systems, with a total of 1,273 bug reports that contain logs. We find that Pathidea achieves a high recall (up to 51.9 percent for Top@5). On average, Pathidea achieves an improvement that varies from 8 to 21 and 5 to 21 percent over BRTracer in terms of Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) across studied systems, respectively. Moreover, we find that the re-constructed execution paths can also complement other IRBL approaches by providing a 10 and 8 percent improvement in terms of MAP and MRR, respectively. Finally, we conduct a parameter sensitivity analysis and provide recommendations on setting the parameter values when applying Pathidea.
EMSE

Demystifying the challenges and benefits of analyzing user-reported logs in bug reports

An Ran Chen, Tse-Hsun Chen, and Shaowei Wang

Empirical Software Engineering, 2021

Abs HTML PDF

Logs in bug reports provide important debugging information for developers. During the debugging process, developers need to study the bug report and examine user-provided logs to understand the system executions that lead to the problem. Intuitively, user-provided logs illustrate the problems that users encounter and may help developers with the debugging process. However, some logs may be incomplete or inaccurate, which can cause difficulty for developers to diagnose the bug, and thus, delay the bug fixing process. In this paper, we conduct an empirical study on the challenges that developers may encounter when analyzing the user-provided logs and their benefits. In particular, we study both log snippets and exception stack traces in bug reports. We conduct our study on 10 large-scale open-source systems with a total of 1,561 bug reports with logs (BRWL) and 7,287 bug reports without logs (BRNL). Our findings show that: 1) BRWL takes longer time (median ranges from 3 to 91 days) to resolve compared to BRNL (median ranges from 1 to 25 days). We also find that reporters may not attach accurate or sufficient logs (i.e., developers often ask for additional logs in the Comments section of a bug report), which extends the bug resolution time. 2) Logs often provide a good indication of where a bug is located. Most bug reports (73%) have overlaps between the classes that generate the logs and their corresponding fixed classes. However, there is still a large number of bug reports where there is no overlap between the logged and fixed classes. 3) Our manual study finds that there is often missing system execution information in the logs. Many logs only show the point of failure (e.g., exception) and do not provide a direct hint on the actual root cause. In fact, through call graph analysis, we find that 28% of the studied bug reports have the fixed classes reachable from the logged classes, while they are not visible in the logs attached in bug reports. In addition, some logging statements are removed in the source code as the system evolves, which may cause further challenges in analyzing the logs. In short, our findings highlight possible future research directions to better help practitioners attach or analyze logs in bug reports.

2019

ICSE

An empirical study on leveraging logs for debugging production failures

An Ran Chen

In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2019

Abs HTML PDF

In modern software development, maintenance is one of the most expensive processes. When end-users encounter software defects, they report the bug to developers by specifying the expected behavior and error messages (e.g., log message). Then, they wait for a bug fix from the developers. However, on the developers’ side, it can be very challenging and expensive to debug the problem. To fix the bugs, developers often have to play the role of detectives: seeking clues in the user-reported logs files or stack trace in a snapshot of specific system execution. This debugging process may take several hours or even days. In this paper, we first look at the usefulness of the user-reported logs. Then, we propose an automated approach to assist the debugging process by reconstructing the execution path. Through the analysis, our investigation shows that 31% of the time, developer further requests logs from the reporter. Moreover, our preliminary results show that the reconducted path illustrates the user’s execution. We believe that our approach proposes a novel solution in debugging production failures.
Thesis

Studying and leveraging user-provided logs in bug reports for debugging assistance

An Ran Chen

Concordia University, 2019

Abs HTML PDF

Bug reports provide important information for developers to debug user-reported issues. During the debugging process, developers need to study the bug report and examine user-provided logs to understand the system execution paths that lead to the problem. Prior studies on bug reports also found that such user-provided often contain valuable debugging information to developers. In this thesis, we conduct a tool-assisted study to study user-provided logs in bug reports. Our goal is to study any challenges that developers may encounter when analyzing the logs, and how many additional buggy classes can the logs help to identify. In particular, we study both system-generated logs and exception stack traces. Our approach tries to simulate developers’ debugging process by 1) identifying the location in the source code where the logs were generated, 2) re-constructing execution paths by finding the call paths are can be traced back from the logs, and 3) studying the additional buggy classes that the re-constructed execution paths identify. We conduct our study on eight large-scale open-source systems with a total of 1,145 bug reports that contain logs. We find that the execution paths cannot be constructed in 32% of the studied bug reports, since many logs can no longer be found in the source code due to code evolution, and users often provide logs that are generated by third-party frameworks. In the rest of the cases, the re-constructed execution paths can identify 15% additional buggy classes in 41% of the bug reports. Through a comprehensive manual study, we find that the main reasons that the re-constructed execution paths fail to identify additional buggy classes are that reporters often only attach logs that describe the unexpected behavior (e.g., stack traces) without additional logs to illustrate the system execution. In summary, this thesis highlights both the challenges and potentials of using user-provided logs to assist developers with debugging. It also revealed common issues with user-provided logs in bug reports, and provided suggestions for future research.