Doctoral Degrees (Computer Science)

Permanent URI for this collection

https://scholar.sun.ac.za/handle/10019.1/96338

Browse

Now showing 1 - 2 of 2

Concept-based exploration of rich semi-structured data collections
(Stellenbosch : Stellenbosch University, 2017-03) Greene, Gillian J.; Fischer, Bernd; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : Search has become one of the fundamental operations in computer science, allowing users to extract data and ultimately information from datasets. However, when users have no previous knowledge of a dataset, or have not clearly defined their search task and are therefore unable to formulate a direct query, their task becomes one of exploratory search or browsing rather than focused search or retrieval. While search and retrieval tools are widely provided, support for browsing of large and especially rich semi-structured datasets, is lacking. Semi-structured data is particularly difficult to explore and query because treating it as complete free-text causes a loss of important additional information which is encoded in the structured portions of the data while considering only the structured fields results in the loss of important free-text information. We therefore develop a framework to support exploration of semi-structured data, which is otherwise difficult to gather insights from, without requiring the user to have prior knowledge of the dataset or have formulated a specific query. Our approach uses a novel combination of tag clouds and concept lattices to facilitate data exploration, analysis and visualization by allowing the user to continuously update (i.e., add and remove) a set of keywords that the searched documents must contain. The document set is not directly provided as the result of a specific query, but aggregated information and properties of relevant documents are provided as a result. We apply our framework to data contained in software repositories, in two different ways for different goals to highlight the flexibility of the approach and the different tasks that can be supported using the same underlying dataset. We also instantiate our framework to support the exploration of a large collection of academic publication data. We evaluate the instantiations of our framework by conducting user and case studies, which indicate that our approach is usable and allows users to gather valuable information from semi-structured data archives.
Untitled
(Stellenbosch : Stellenbosch University, 2023-03) Raselimo, Moeketsi; Fischer, Bernd; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: Context-free grammars (CFGs) or simply grammars, are widely used to describe and encode the structure of complex objects. Systems that process such structured objects are known as grammarware. CFGs are also used for software engineering tasks such as testing grammarware, specifically in test suite generation and fuzzing. However, correct and complete CFGs are rarely available, which limits the benefits of these automated techniques. While current grammar engineering approaches can demonstrate the presence of faults in CFGs, none of them can be used to confirm the locations of these faults, much less repair them. In this thesis, we address this problem and develop, implement and evaluate novel automated methods that find locations of faults in CFGs and repair them against a test suite specification. We describe and evaluate the first spectrum-based method aimed at finding faults CFGs. In its basic form, it takes as input a test suite and a modified parser for the grammar that can collect grammar spectra, i.e., the sets of grammar elements used in attempts to parse the individual test cases, and returns as output a ranked list of suspicious elements. We define grammar spectra suitable for localizing faults on the level of the grammar rules and at the rules’ individual symbols, respectively. We show how these types of spectra can be collected in widely used parsing tools. We also show how the spectra can be constructed directly from test cases derived from a grammar, and how these synthetic spectra can be used for localization in cases where standalone parsers implementing the grammars are not available. We qualitatively and quantitatively demonstrate the effectiveness of our fault localization approach. We first evaluate our method over a large number of medium-sized single fault CFGs, which we constructed by fault seeding from a common origin grammar. At the rule level, it ranks the rules containing the seeded faults within the top five rules in about 40%–70% of the cases, and pinpoints them (i.e., correctly identifies them as unique most suspicious rule) in about 10%–30% of the cases, with significantly better results for the synthetic spectra. On average, it ranks the faulty rules within about 25% of all rules. At the item level, our method remains remarkably effective despite the larger number of possible locations: it typically ranks the seeded faults within the top five positions in about 30%–60% of the cases, and pinpoints them in about 15%–40% of the cases. On average, it ranks the seeded faults within about 10%–20% of all positions. We further evaluate our method over CFGs that contain real faults. We show that an iterative approach can be used to localize and manually remove one by one multiple faults in grammars submitted by students enrolled in various compiler engineering courses; in most iterations, the topranked rule already contains an error, and no error is ranked outside the top five ranked rules. We finally apply our method to a large open-source SQLite grammar and show where the original version deviates from the language accepted by the actual SQLite system. We then describe the first method to automatically repair faults in CFGs: given a grammar that fails some tests in a given test suite, we iteratively and gradually transform the grammar until it passes all tests. Our core idea is to build on spectrum-based fault localization to identify promising repair sites (i.e., specific positions in rules), and to apply grammar patches at these sites whenever they satisfy explicitly formulated pre-conditions necessary to potentially improve the grammar. We implement and evaluate passive and active repair variants of our approach. In passive repair, we repair against the fixed input test suite as specification. The active repair variant takes a black-box parser for the unknown target language or oracle that can answer membership queries. The key extension of active repair is to incorporate some test suite enrichment, by generating additional tests from each repair candidate and using the oracle to confirm the outcome of each of these tests. We demonstrate the effectiveness of both repair variants using thirtythree student grammars that contain multiple faults. We show that both variants are effective in fixing real faults in these grammars. A like-for-like comparison of both variants shows that active repair produces more high quality repairs than passive repair.

Browse

Browsing Doctoral Degrees (Computer Science) by browse.metadata.advisor "Fischer, Bernd"

Results Per Page

Sort Options