Masters Degrees (Computer Science)

Permanent URI for this collection

https://scholar.sun.ac.za/handle/10019.1/96339

Browse

Now showing 1 - 5 of 42

Large language models and software testing
(Stellenbosch : Stellenbosch University, 2024-03) Dewey, Marco; Inggs, Cornelia P. ; Visser, Willem; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: This thesis examines the viability of leveraging transformer-based large language models, exemplified by Codex, f or the a utomated g eneration of test suites in production software. By leveraging the abilities large language models exhibit for understanding and generating natural and coding languages, these models can analyze code and comments to generate contextually relevant test cases. Using these models in the domain of automatic software testing presents a potential solution to the oracle problem. The research involves a comparative analysis between Codex and a promi- nent automatic testing tool, EvoSuite, using the Commons-Lang library from the Defects4J benchmark. This comparison draws insights regarding Codex’s efficacy in ge nerating co verage te sts an d id entifying fa ulty be havior within production code. The findings o f t his thesis a rgue t hat C odex w hile demon- strating promise, exhibits limitations as an automatic testing tool in achieving high test coverage and uncovering software bugs. Moreover, the study high- lights potential challenges associated with utilizing open-source repositories for training and testing code generation by large language models, including the risk of incorporating inconsistent coding conventions and suboptimal software testing practices into these models.
Ergo: a gesture-based computer interaction device
(Stellenbosch : Stellenbosch University, 2024-03) Kane, Boyd Robert; Grobler, Trienko Lups; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: This thesis presents Ergo, a bespoke glove‐based sensor suite designed to fully replace the regular QWERTY keyboard in terms of both number of input keys and speed of input. Ergo collects acceleration data from each of the user’s 10 fingertips at 40 times per second and is able to predict which of 50 differ‐ ent hand gestures is being performed at 40 times per second. The user does not need to explicitly mark the start or end of each gesture, as Ergo is able to automatically distinguish between intentional gestures made by the user and other non‐gesture movements. When a known gesture is detected, a corre‐ sponding keystroke is emitted, allowing the user to “type” on their computer by performing gestures in sequence. Five different classification models are eval‐ uated (Hidden Markov Models, Support Vector Machines, Cumulative Sum, and two different Neural Network architectures) and Neural Networks are found to be the most effective. The difference in difficulty between classification tasks which either do or do not include observations without intentional movement is also evaluated. The additional requirement to be able to distinguish inten‐ tional gestures from other hand movements is found to increase the difficulty of the task significantly.
Particle swarm optimization for constrained multimodal function optimization
(Stellenbosch : Stellenbosch University, 2024-03) Strelitz, Benjamin Steenveld; Engelbrecht, Andries; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: This thesis investigates the efficiency of particle swarm optimization (PSO) algorithms at finding many feasible global optima for constrained multimodal optimization prob- lems. The proposed approach is the niching migratory multi-swarm optimizer with Deb's comparison criteria (NMMSO-DCC) algorithm. The NMMSO-DCC algorithm uses the same core architecture as the niching migratory multi-swarm optimization (NMMSO) al- gorithm, but uses Deb's comparison criteria as a constraint handling method. Deb's com- parison criteria allows the NMMSO-DCC algorithm to find many feasible global optima for constrained multimodal optimization problems (CMMOPs), whereas the NMMSO algorithm was designed only to find global optima for boundary constrained multimodal optimization problems (MMOPs). The NMMSO algorithm is one of the state-of-the-art multiomodal optimization algorithms, but cannot be used when constraints are placed on the objective function. Thus, the proposed algorithm addresses the inability of the NMMSO algorithm to solve constrained multimodal optimization problems. This study assumes that the objective function to be optimized remains static throughout the search process. This study also assumes that the constraints placed upon the objective func- tion remain static during the search process. All benchmark problems in this study contain boundary constraints. The results indicate that the NMMSO-DCC performs competitively compared to other state-of-the-art constrained multimodal optimization algorithms. The results in terms of success rate are particularly convincing, whereas NMMSO-DCC struggled more with respect to the peak ratio. This means that although the NMMSO-DCC algorithm is able to locate all global optima within a given tolerance level in some of the independent runs, it struggles to do so consistently across multiple independent runs.
Landscape analysis-based automated algorithm selection
(Stellenbosch : Stellenbosch University, 2024-03) Lang, Ryan Dieter; Engelbrecht, Andries; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: The algorithm selection problem, which was developed by Rice, refers to the chal- lenge of choosing the best algorithm available to solve a specific p roblem. The framework put forth by Rice includes four spaces: the problem space, the algorithm space, the characteristic space, and the performance space. In this thesis, the focus is on the problem space of continuous-valued single-objective, boundary-constrained optimisation problems. Landscape analysis, which is a set of techniques that utilises mathematical and statistical methods to characterise the properties of optimisation problems, can be used to describe the characteristic space. Selection of the most effective algorithm to solve optimisation problems is a complex task, because meta- heuristics have varying strengths. Therefore, a data-driven approach that utilises landscape analysis techniques is employed in this thesis to create an automated algorithm selector. This thesis proposes enhancements to the characteristic space by critically evaluat- ing the reliability of the methods used in generating landscape analysis measures. Additionally, a new benchmark suite, which is more representative of the problem space than commonly used benchmark suites in the literature, is proposed. An investigation into the impact on the performance of hybrid metaheuristics created by combining the sampling algorithms used to calculate landscape analysis features with standard metaheuristics, which are used to solve the optimisation problem, is conducted. By combining the proposed improvements to the characteristic, algo- rithm, and problem spaces of the algorithm selection framework, this thesis concludes with a landscape analysis-based automated algorithm selection model that outper- forms the current automated algorithm selection models found in the literature in terms of performance and generalisability. Furthermore, this thesis explores the use of automated machine learning and hybrid metaheuristics in automated algorithm selection.
Hierarchical text classification with transformer-based language models
(Stellenbosch : Stellenbosch University, 2024-03) Du Toit, Jaco; Dunaiski, Marcel; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: Hierarchical text classification (HTC) is a natural language processing (NLP) task which has the objective of classifying text documents into a set of classes from a structured class hierarchy. For example, news articles can be classified into a hierarchical class set which comprises broad categories such as “Politics” and “Sport” in higher-levels with associated finer-grained categories such as “Europe” and “Cycling” in lower-levels. In recent years many different NLP approaches have been significantly improved through the use of transformer-based pre-trained language mod- els (PLMs). PLMs are typically trained on large amounts of textual data through self-supervised tasks such that they acquire language understanding capabilities which can be used to solve various NLP tasks, including HTC. In this thesis, we propose three new approaches for leveraging transformer-based PLMs to improve classification performance on HTC tasks. Our first approach formulates how hierarchy-aware prompts can be applied to discriminative language models such that it allows HTC tasks to scale to problems with very large hierarchical class structures. Our second approach uses label-wise attention mechanisms to obtain label-specific document repre- sentations which are used to fine-tune PLMs for HTC tasks. Furthermore, we propose a label-wise attention mechanism which splits the attention mecha- nisms into the different levels of the class hierarchy and leverages the predic- tions of all ancestor levels during the prediction of classes at a particular level. The third approach combines features extracted from a PLM and a topic model to train a classifier which comprises convolutional layers followed by a label- wise attention mechanism. We evaluate all three approaches comprehensively and show that our first two proposed approaches obtain state-of-the-art per- formances on three HTC benchmark datasets. Our results show that the use of prompts and label-wise attention mechanisms to fine-tune PLMs are very effective techniques for classifying text documents into hierarchical class sets. Furthermore, we show that these techniques are able to effectively leverage the language understanding capabilities of PLMs and incorporate the hierarchical class structure information to improve classification performance. We also introduce three new HTC benchmark datasets which comprise the titles and abstracts of research publications from the Web of Science publica- tion database with associated categories. The first two datasets use journal- and citation-based classification schemas respectively, while the third dataset combines these classifications with the aim of removing documents and classes which do not have a clear overlap between the two schemas. We show that this results in a more consistent classification of the publications. Finally, we per- form experiments on these three datasets with the best-performing approaches proposed in this thesis to provide a baseline for future research.

Browse

Recent Submissions