Using the structural location of terms to improve the results of text retrieval based approaches to feature location

Loading...
Thumbnail Image
Date
2015
Journal Title
Journal ISSN
Volume Title
Publisher
University of Alabama Libraries
Abstract

Software maintenance and evolution make up a considerable portion of the time and effort spent during the life cycle of a software system. During the maintenance and evolution phase, the majority of a developer’s time is spent on program comprehension tasks. Feature location (i.e., identifying a starting point for a change), impact analysis (i.e., identifying all source elements involved in a change), and software summarization (i.e., automatically summarizing the responsibilities of a source element) are examples of such tasks. Recent research in these areas has focused on improving each process to ease the burden on developers and decrease the time spent in each task through the use of textual information, dependency graphs, and execution traces. Furthermore, the success of text retrieval in other areas (e.g., traceability) has initiated new studies in automating feature location by the use of text retrieval techniques, such as the vector space model (VSM), latent semantic indexing (LSI), and latent Dirichlet allocation (LDA). Some research has been done to improve LSI and VSM models by combining structural information (i.e., information regarding the creation and use of objects and methods within the code) with the corpus obtained from extracting text from source code. However, little research has focused on improving LDA and more sophisticated topic models (i.e., a statistical model of the abstract topics that occur in a corpus) with structural information. Furthermore, no study has looked at how a developer’s knowledge of a software system’s structure may be incorporated into text retrieval based feature location for software maintenance tasks. The research presented in this dissertation makes two main contributions. First, it evaluates a methodology for incorporating structural information into the corpus obtained in the text extraction phase by modifying the weights of terms based on their importance to the individual source elements. Furthermore, this dissertation introduces a novel technique for performing structured text retrieval that allows developers to use their existing knowledge about the structure of a software system. This dissertation is organized into the following parts: a demonstration of the effects of structural weighting schemes on the effectiveness of topic-modeling in feature location, the introduction of a new structured source code retrieval model, a demonstration of the effects of structured queries on the effectiveness of structured source code retrieval for feature location, and additional insights into how and when these approaches should be incorporated.

Description
Electronic Thesis or Dissertation
Keywords
Computer science
Citation