- ItemLandscape aware algorithm selection for feature selection(Stellenbosch : Stellenbosch University, 2023-10) Mostert, Werner; Engelbrecht, Andries Petrus; Malan, Katherine Mary; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Computer Science Division.ENGLISH ABSTRACT: Feature selection is commonly applied as a pre-processing technique for machine learning to reduce the dimensionality of a problem by removing redundant and irrelevant features. Another desirable outcome of feature selection lies in the potential performance improvement of predictive models. The development of new feature selection algorithms are common within the field, however, relatively little research has historically been done to better understand the feature selection problem from a theoretical perspective. Researchers and practitioners in the field often rely on a trial-and-error strategy to decide on which feature selection algorithm to use for a specific instance of a machine learning problem. This thesis contributes towards a better understanding of the complex feature selection problem by investigating the link between feature selection problem characteristics and the performance of feature selection algorithms. A variety of fitness landscape analysis techniques are used to gain insights into the structure of the feature selection fitness landscape. Performance complementarity for feature selection algorithms is empirically shown, emphasising the potential value of automated algorithm selection for feature selection algorithms. Towards the realisation of a landscape aware algorithm selector for feature selection, a novel performance metric for feature selection algorithms is presented. The baseline fitness improvement (BFI) performance metric is unbiased and can be used for comparative analysis across feature selection problem instances. The insights obtained via landscape analysis are used with other meta-features of datasets and the BFI performance measure to develop a new landscape aware algorithm selector for feature selection. The landscape aware algorithm selector provides a human-interpretable predictive model of the best feature selection algorithm for a specific dataset and classification problem.
- ItemUntitled(Stellenbosch : Stellenbosch University, 2023-03) Raselimo, Moeketsi; Fischer, Bernd; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: Context-free grammars (CFGs) or simply grammars, are widely used to describe and encode the structure of complex objects. Systems that process such structured objects are known as grammarware. CFGs are also used for software engineering tasks such as testing grammarware, specifically in test suite generation and fuzzing. However, correct and complete CFGs are rarely available, which limits the benefits of these automated techniques. While current grammar engineering approaches can demonstrate the presence of faults in CFGs, none of them can be used to confirm the locations of these faults, much less repair them. In this thesis, we address this problem and develop, implement and evaluate novel automated methods that find locations of faults in CFGs and repair them against a test suite specification. We describe and evaluate the first spectrum-based method aimed at finding faults CFGs. In its basic form, it takes as input a test suite and a modified parser for the grammar that can collect grammar spectra, i.e., the sets of grammar elements used in attempts to parse the individual test cases, and returns as output a ranked list of suspicious elements. We define grammar spectra suitable for localizing faults on the level of the grammar rules and at the rules’ individual symbols, respectively. We show how these types of spectra can be collected in widely used parsing tools. We also show how the spectra can be constructed directly from test cases derived from a grammar, and how these synthetic spectra can be used for localization in cases where standalone parsers implementing the grammars are not available. We qualitatively and quantitatively demonstrate the effectiveness of our fault localization approach. We first evaluate our method over a large number of medium-sized single fault CFGs, which we constructed by fault seeding from a common origin grammar. At the rule level, it ranks the rules containing the seeded faults within the top five rules in about 40%–70% of the cases, and pinpoints them (i.e., correctly identifies them as unique most suspicious rule) in about 10%–30% of the cases, with significantly better results for the synthetic spectra. On average, it ranks the faulty rules within about 25% of all rules. At the item level, our method remains remarkably effective despite the larger number of possible locations: it typically ranks the seeded faults within the top five positions in about 30%–60% of the cases, and pinpoints them in about 15%–40% of the cases. On average, it ranks the seeded faults within about 10%–20% of all positions. We further evaluate our method over CFGs that contain real faults. We show that an iterative approach can be used to localize and manually remove one by one multiple faults in grammars submitted by students enrolled in various compiler engineering courses; in most iterations, the topranked rule already contains an error, and no error is ranked outside the top five ranked rules. We finally apply our method to a large open-source SQLite grammar and show where the original version deviates from the language accepted by the actual SQLite system. We then describe the first method to automatically repair faults in CFGs: given a grammar that fails some tests in a given test suite, we iteratively and gradually transform the grammar until it passes all tests. Our core idea is to build on spectrum-based fault localization to identify promising repair sites (i.e., specific positions in rules), and to apply grammar patches at these sites whenever they satisfy explicitly formulated pre-conditions necessary to potentially improve the grammar. We implement and evaluate passive and active repair variants of our approach. In passive repair, we repair against the fixed input test suite as specification. The active repair variant takes a black-box parser for the unknown target language or oracle that can answer membership queries. The key extension of active repair is to incorporate some test suite enrichment, by generating additional tests from each repair candidate and using the oracle to confirm the outcome of each of these tests. We demonstrate the effectiveness of both repair variants using thirtythree student grammars that contain multiple faults. We show that both variants are effective in fixing real faults in these grammars. A like-for-like comparison of both variants shows that active repair produces more high quality repairs than passive repair.
- ItemMixtures of heterogeneous experts(Stellenbosch : Stellenbosch University, 2022-10) Omomule, Taiwo Gabriel; Engelbrecht, Andries; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: This research considers the problem of the No-Free-Launch-Theorem, which states that no one machine learning algorithm performs best on all problems due to algorithms having different inductive biases. Another problem is that the combinations of experts of the same type, referred to as a mixture of homogeneous experts, do not capitalize on the different inductive biases of different machine learning algorithms. Research has shown that mixtures of homogeneous experts deliver improved accuracy compared to that of the base experts in the mixture. However, the predictive power of a homogeneous mixture of experts is still limited by the inductive bias of the algorithm that makes up the mixture of experts. Therefore, this research proposes the development of mixtures of heterogeneous experts through the combination of different machine learning algorithms to take advantage of the strengths of the machine learning algorithms and to reduce the adverse effects of the inductive biases of the different algorithms. A set of different machine learning algorithms are selected to develop four different types of mixtures of experts in the research. Empirical analyses are performed using nonparametric statistical tests to compare the generalization performance of the ensembles. The comparison is carried out to investigate the performance of the homogeneous and heterogeneous ensembles in a number of modelling studies examined on a set of classification and regression problems using selected performance measures. The problems represent varying levels of complexity and characteristics to determine the characteristics and complexities for which the heterogeneous ensembles outperform homogeneous ensembles. For classification problems, the empirical results across six modelling studies indicate that heterogeneous ensembles generate improved predictive performance compared to the developed homogeneous ensembles by taking advantage of the different inductive biases of the different base experts in the ensembles. Specifically, the heterogeneous ensembles developed using different machine learning algorithms, with the same and different configurations, showed superiority over other heterogeneous ensembles and the homogeneous ensembles developed in this research. The ensembles achieved the best and second-best overall accuracy rank across the classification datasets in each modelling study. For regression problems, the heterogeneous ensembles outperformed the homogeneous ensembles across five modelling studies. Although, a random forest algorithm achieved competitive generalization performance compared to that of the heterogeneous ensembles. Based on the average ranks, the heterogeneous ensembles developed using different machine learning algorithms where the base members consist of the same and different configurations still produced better predictive performance than a number of heterogeneous ensembles and homogeneous ensembles across the modelling studies. Therefore, the implementation of a mixture of heterogeneous experts removes the need for the computationally expensive process of finding the best performing homogeneous ensemble. The heterogeneous ensembles of different machine learning algorithms are consistently the most or one of the most accurate ensembles across all classification and regression problems. This is attributed to the advantage of capitalizing on the inductive biases of the different machine learning algorithms and the different configurations of the base members in the ensembles.
- ItemActive strategies for coordination of solitary robots(Stellenbosch : Stellenbosch University, 2020-12) Masakuna, Jordan Felicien; Utete, Simukai Wanzira; Kroon, R. S. (Steve); Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.ENGLISH ABSTRACT: This thesis considers the problem of search of an unknown environment by multiple solitary robots: self-interested robots without prior knowledge about each other, and with restricted perception and communication capacity. When solitary robots accidentally interact with each other, they can leverage each other’s information to work more effectively. In this thesis, we consider three problems related to the treatment of solitary robots: coordination, construction of a view of the network formed when robots interact, and classifier fusion. Coordination is the key focus for search and rescue. The other two problems are related areas inspired by the problems we encountered while developing our coordination method. We propose a coordination strategy based on cellular decomposition of the search environment, which provides sustainable performance when a known available search time (bound) is insufficient to cover the entire search environment. A sustainable performance is achieved when robots that know about each other explore non-overlapping regions. For network construction, we propose modifications to a scalable decentralised method for constructing a model of network topology which reduces the number of messages exchanged between interacting nodes. The method has wider potential application than mobile robotics. For classifier fusion, we propose an iterative method where outputs of classifiers are combined without using any further information about the behaviour of the individual classifiers. Our approaches for each of these problems are compared to state-of-the-art methods.
- ItemOn noise regularised neural networks: initialisation, learning and inference(Stellenbosch : Stellenbosch University, 2019-12) Pretorius, Arnu; Kroon, R. S. (Steve); Kamper, M. J.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.ENGLISH ABSTRACT: Innovation in regularisation techniques for deep neural networks has been a key factor in the rising success of deep learning. However, there is often limited guidance from theory in the development of these techniques and our understanding of the functioning of various successful regularisation techniques remains impoverished. In this work, we seek to contribute to an improved understanding of regularisation in deep learning. We specifically focus on a particular approach to regularisation that injects noise into a neural network. An example of such a technique which is often used is dropout (Srivastava et al., 2014). Our contributions in noise regularisation span three key areas of modeling: (1) learning, (2) initialisation and (3) inference. We first analyse the learning dynamics of a simple class of shallow noise regularised neural networks called denoising autoencoders (DAEs) (Vincent et al., 2008), to gain an improved understanding of how noise affects the learning process. In this first part, we observe a dependence o f learning behaviour on initialisation, which leads us to study how noise interacts with the initialisation of a deep neural network in terms of signal propagation dynamics during the forward and backward pass. Finally, we consider how noise affects inference in a Bayesian context. We mainly focus on fully-connected feedforward neural networks with rectifier linear unit (ReLU) activation functions throughout this study. To analyse the learning dynamics of DAEs, we derive closed form solutions to a system of decoupled differential equations that describe the change in scalar weights during the course of training as they approach the eigenvalues of the input covariance matrix (under a convenient change of basis). In terms of initialisation, we use mean field theory to approximate the distribution of the pre-activations of individual neurons, and use this to derive recursive equations that characterise the signal propagation behaviour of the noise regularised network during the first forward and backward pass o f training. Using these equations, we derive new initialisation schemes for noise regularised neural networks that ensure stable signal propagation. Since this analysis is only valid at initialisation, we next conduct a large-scale controlled experiment, training thousands of networks under a theoretically guided experimental design, for further testing the effects of initialisation on training speed and generalisation. To shed light on the influence of noise on inference, we develop a connection between randomly initialised deep noise regularised neural networks and Gaussian processes (GPs)—non-parametric models that perform exact Bayesian inference—and establish new connections between a particular initialisation of such a network and the behaviour of its corresponding GP. Our work ends with an application of signal propagation theory to approximate Bayesian inference in deep learning where we develop a new technique that uses self-stabilising priors for training deep Bayesian neural networks (BNNs). Our core findings are as follows: noise regularisation helps a model to focus on the more prominent statistical regularities in the training data distribution during learning which should be useful for later generalisation. However, if the network is deep and not properly initialised, noise can push network signal propagation dynamics into regimes of poor stability. We correct this behaviour with proper “noise-aware” weight initialisation. Despite this, noise also limits the depth to which networks are able to train successfully, and networks that do not exceed this depth limit demonstrate a surprising insensitivity to initialisation with regards to training speed and generalisation. In terms of inference, noisy neural network GPs perform best when their kernel parameters correspond to the new initialisation derived for noise regularised networks, and increasing the amount of injected noise leads to more constrained (simple) models with larger uncertainty (away from the training data). Lastly, we find our new technique that uses self-stabilising priors makes training deep BNNs more robust and leads to improved performance when compared to other state-of-the-art approaches.