Masters Degrees (Computer Science)

Permanent URI for this collection

https://scholar.sun.ac.za/handle/10019.1/96339

Browse

Now showing 1 - 20 of 36

Application of statistical pattern recognition and deep learning for morphological classification in radio astronomy
(Stellenbosch : Stellenbosch University, 2022-04) Becker, Adolf Burger; Grobler, Trienko; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: The morphological classification of radio sources is important to gain a full under standing of galaxy evolution processes and their relation with local environmental properties. Furthermore, the complex nature of the problem, its appeal for citi zen scientists and the large data rates generated by existing and upcoming radio telescopes combine to make the morphological classification of radio sources an ideal test case for the application of machine learning techniques. One approach that has shown great promise recently is Convolutional Neural Networks (CNNs). Literature, however, lacks two major things when it comes to CNNs and radio galaxy morphological classification. Firstly, a proper analysis to identify whether overfitting occurs when training CNNs to perform radio galaxy morphological clas sification is needed. Secondly, a comparative study regarding the practical appli cability of the CNN architectures in literature is required. Both of these short comings are addressed in this thesis. Multiple performance metrics are used for the latter comparative study, such as inference time, model complexity, compu tational complexity and mean per class accuracy. A ranking system based upon recognition and computational performance is proposed. MCRGNet, ATLAS and ConvXpress (novel classifier) are the architectures that best balance computational requirements with recognition performance.
Automatic assignment of diagnosis codes to free-form text medical notes
(Stellenbosch : Stellenbosch University, 2021-12) Strydom, Stefan; Van der Merwe, Brink; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.
ENGLISH ABSTRACT: Clinical coding is the process of describing and categorising healthcare episodes according to standardised ontologies. The coded data have important downstream applications, including population morbidity studies, health systems planning and reimbursement. Clinical codes are generally assigned based on information contained in free-form text clinical notes by specialist human coders. This process is expensive, time-consuming, subject to human error and burdens scarce clinical human resources with administrative roles. An accurate automatic coding system can alleviate these problems. Clinical coding is a challenging task for machine learning systems. The source texts are often long, has a highly specialised vocabulary, contains non-standard clinician shorthand and the code sets can contain tens-of-thousands of codes. We review previous work on clinical auto-coding systems and perform an empirical analysis of widely used and current state-of-the-art machine learning approaches to the problem. We propose a novel attention mechanism that takes the text description of clinical codes into account. We also construct a small pre-trained transformer model that achieves state-of-the-art performance on the MIMIC II and III ICD-9 auto-coding tasks. To the best of our knowledge, it is the first successful application of a pre-trained transformer model on this task.
Automatic Prediction of Comment Quality
(Stellenbosch : Stellenbosch University, 2016-03) Brand, Dirk Johannes; Van der Merwe, Brink; Kroon, R. S. (Steve); Cleophas, Loek; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : The problem of identifying and assessing the quality of short texts (e.g. comments, reviews or web searches) has been intensively studied. There are great bene ts to being able to analyse short texts. As an example, advertisers might be interested in the sentiment of product reviews on e-commerce sites to more e ciently pair marketing material to content. Analysing short texts is a di cult problem, because traditional machine learning models generally perform better on data sets with larger samples, which often translates to more features. More data allow for better estimation of parameters for these models. Short texts generally do not have much content, but still carry high variability in that they may still consist of a large corpus of words. This thesis investigates various methods for feature extraction for short texts in the context of online user comments. These methods include the leading manual feature extraction techniques for short texts, N-gram models and techniques based on word embeddings. The e ect of using di erent kernels for a support vector classi er is also investigated. The investigation is centred around two data sets, one provided by News24 and the other extracted from Slashdot.org. It was found that N-gram models performed relatively well, mostly outperforming manual feature extraction techniques.
Automatic recognition and interpretation of finite state automata diagrams
(Stellenbosch : Stellenbosch University, 2015-12) Babalola, Olusola Tope; Van Zijl, Lynette; Stellenbosch University. Faculty of Science. Department Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : An application capable of reading graphically-encoded information is beneficial to blind or visually impaired students. Such a system needs to recognize and understand visual markings and their arrangement as presented in a diagram image. In that light, this thesis examines the practical possibility of a real world system for the automatic recognition and interpretation of machine-printed Finite State Automata diagrams. The suggested system uses known image processing and pattern recognition methods to extract the visual markings from the diagram image pixels. A second stage, to interpret the meaning of the diagram, is based on modeling the language of Finite State Automata diagrams using Constraint Multiset Grammars. Our results show that a practical application for automatic interpretation of Finite State Automata diagrams is possible.
Combining reverse debugging and live programming towards visual thinking in computer programming
(Stellenbosch : Stellenbosch University, 2015-03) Coetzee, Abraham Liebrecht; Van Zijl, Lynette; Hoffmann, McElory R.; Stellenbosch University. Faculty of Science. Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : Interaction plays a key role in the process of learning, and a learner’s abilities are enhanced when multiple cognitive functions work in parallel, especially those related to language and visuals. Time is the most fundamental variable that governs the interaction between programmer and computer, and the substantial temporal separation of cause and effect leads to poor mental models. Furthermore, programmers do not have means by which to express their mental models. The feasibility of combining reverse debugging and live programming was therefore investigated. This combination was found to be feasible, and a reverse debugger with higher levels of liveness was created for the Python programming language. It establishes a foundation for combining language and visual models as aids in computer programming education.
Combining tree kernels and text embeddings for plagiarism detection
(Stellenbosch : Stellenbosch University, 2018-03) Thom, Jacobus Daniel; Van der Merwe, A. B.; Kroon, R. S. (Steve); Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : The internet allows for vast amounts of information to be accessed with ease. Consequently, it becomes much easier to plagiarize any of this information as well. Most plagiarism detection techniques rely on n-grams to find similarities between suspicious documents and possible sources. N-grams, due to their simplicity, do not make full use of all the syntactic and semantic information contained in sentences. We therefore investigated two methods, namely tree kernels applied to the parse trees of sentences and text embeddings, to utilize more syntactic and semantic information respectively. A plagiarism detector was developed using these techniques and its effectiveness was tested on the PAN 2009 and 2011 external plagiarism corpora. The detector achieved results that were on par with the state of the art for both PAN 2009 and PAN 2011. This indicates that the combination of tree kernel and text embedding techniques is a viable method of plagiarism detection.
Concrete and symbolic linearisability checking of non-blocking concurrent data structures
(Stellenbosch : Stellenbosch University, 2021-12) Du Toit, Nicole Cathryn; Inggs, Cornelia P.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.
ENGLISH ABSTRACT: Non-blocking concurrent data structures are developed as a more efficient solution to concurrent data structures; in non-blocking concurrent data structures hardware-level atomic instructions are used instead of higher-level, expensive locking mechanisms. Lock-free algorithms, however, are notoriously hard to design and prone to subtle concurrency errors that are difficult to pick up. Linearisability Checking is the standard correctness condition for non-blocking concurrent data structures; a data structure is linearisable if each concurrent execution of the data structure corresponds to the execution of its correct sequential specification. In this thesis, the focus is on the linearisability checking of non-blocking data structures using a model checker. The approaches for checking linearisability using a model checker can be broadly categorised into linearisation point and automatic linearisability checking. The state-of-the-art strategies were implemented using the Java PathFinder Model Checker as basis. The linearisation point linearisability checking strategy of Vechev et al. was extended to include data structures with operations that act generically on the data structure, and not just on one element in the data structure. An improved version of Doolan et al.’s external automatic checker was implemented and the idea of an external checker was extended to the improved linearisation point checking strategy. The lazy read optimisation, proposed by Long et al., and a hash optimisation, proposed in this thesis, for the automatic checker was implemented and the effectiveness and benefit of the optimisations determined. The performance-limiting factor of the automatic checker was investigated and the claims made by Vechev et al., Liu et al., and Doolan et al. confirmed/falsified. The concrete checker’s usefulness in finding linearisability errors is constrained by the user’s ability to hand-craft test cases in which errors are present. A new Symbolic Linearisability Checker was developed, the major novel contribution in this thesis, that integrates linearisability checking into Symbolic PathFinder, a symbolic model checker. The symbolic checker performs linearisability checking on all possible test cases and program paths; it verifies the linearisability of a data structure in general, constrained only by a user-defined number of operations to be executed by each thread. Finally, extensive evaluations and comparisons of all checkers were performed, on the same model checking framework and hardware, considering their manual input required, resource usage, scalability, and ability to find errors.
Coverage directed algorithms for test suite construction from LR-automata
(Stellenbosch : Stellenbosch University, 2022-04) Rossouw, Christoffel Jacobus; Fischer, Bernd; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: Bugs in software can have disastrous results in terms of both economic cost and human lives. Parsers can have bugs, like any other type of software, and must therefore be thoroughly tested in order to ensure that a parser recognizes its intended language accurately. However, parsers often need to recognize many different variations and combinations of grammar structures which can make it time consuming and difficult to construct test suites by hand. We therefore require automated methods of test suite construction for these systems. Currently, the majority of test suite construction algorithms focus on the grammar describing the language to be recognized by the parser. In this thesis we take a different approach. We consider the LR-automaton that recognizes the target language and use the context information encoded in the automaton. Specifically, we define a new class of algorithm and coverage criteria over a variant of the LR-automaton that we define, called an LR-graph. We define methods of constructing positive test suites, using paths over this LR-graph, as well as mutations on valid paths to construct negative test suites. We evaluate the performance of our new algorithms against other state-of-the-art algorithms. We do this by comparing coverage achieved over various systems, some smaller systems used in a university level compilers course and other larger, real-world systems. We find good performance of our algorithms over these systems, when compared to algorithms that produce test suites of equivalent size. Our evaluation has uncovered a problem in grammar-based testing algorithms that we call bias. Bias can lead to significant variation in coverage achieved over a system, which can in turn lead to a flawed comparison of two algorithms or unrealized performance when a test suite is used in practice. We therefore define bias and measure it for all grammar-based test suite construction algorithms we use in this thesis.
Creating 3D models using reconstruction techniques
(Stellenbosch : Stellenbosch University, 2018-12) Martin, Javonne Jason; Kroon, R. S. (Steve); De Villiers, H. A. C.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.
ENGLISH ABSTRACT :Virtual reality models of real world environments have a number of compelling applications, such as preserving the architecture and designs of older buildings. This process can be achieved by using 3D artists to reconstruct the environment, however this is a long and expensive process. Thus, this thesis investigates various techniques and approaches used in 3D reconstruction of environments using a single RGB-D camera and aims to reconstruct the 3D environment to generate a 3D model. This would allow non-technical users to reconstruct environments and use these models in business and simulations, such as selling real-estate, modifying pre-existing structures for renovation and planning. With the recent improvements in virtual reality technology such as the Oculus Rift and HTC Vive, a user can be immersed into virtual reality environments created from real world structures. A system based on Kinect Fusion is implemented to reconstruct an environment and track the motion of the camera within the environment. The system is designed as a series of selfcontained subsystems that allows for each of the subsystems to be modified, expanded upon or easily replaced by alternative methods. The system is made available as an open source C++ project using Nvidia’s CUDA framework to aid reproducibility and provides a platform for future research. The system makes use of the Kinect sensor to capture information about the environment. A coarse-to-fine least squares approach is used to estimate the motion of the camera. In addition, the system employs a frame-to-model approach that uses a view of the estimated reconstruction of the model as the reference frame and the incoming scene data as the target. This minimises the drift with respect to the true trajectory of the camera. The model is built using a volumetric approach, with volumetric information implicitly stored as a truncated signed distance function. The system filters out noise in the raw sensor data by using a bilateral filter. A point cloud is extracted from the volume using an orthogonal ray caster which enables an improved hole-filling approach. This allows the system to extract both the explicit and implicit structure from the volume. The 3D reconstruction is followed by mesh generation based on the point cloud. This is achieved by using an approach related to Delaunay triangulation, the ball-pivot algorithm. The resulting system processes frames at 30Hz, enabling real-time point cloud generation, while the mesh generation occurs offline. This system is initially tested using Blender to generate synthetic data, followed by a series of real world tests. The synthetic data is used to test the presented system’s motion tracking against the ground truth. While the presented system suffers from the effects of drift over long frame sequences, it is shown to be capable of tracking the motion of the camera. This thesis finds that the ball pivot algorithm can generate the edges and faces for synthetic point clouds, however it performs poorly when using the noisy synthetic and real world data sets. Based on the results obtained it is recommended that the obtained point cloud be preprocessed to remove noise before it is provided to the mesh generation algorithm and an alternative mesh generation technique should be employed that is more robust to noise.
A deep framework for predictive maintenance
(2021-12-01) Steyl, Charl Cilliers; Hoffmann, McElory R.; Grobler, Trienko Lups; Herbst, Barend Mattheus
ENGLISH ABSTRACT: Predictive maintenance (PdM) is a well-known maintenance approach that comprises of two problems, machine prognostic modelling and maintenance scheduling. The objective of prognostic modelling is to predict faults in machine components such as aircraft engines, lithium-ion batteries or bearings. The objective of maintenance scheduling is to reduce the cost of performing maintenance once the future degradation behaviour of a component has been established. Sensors are used to monitor the degradation behaviour of components as they change over time. Supervised learning is a suitable solution for prognostic modelling problems, especially with the increase in sensor readings being collected with Internet of Things (IoT) devices. Prognostic modelling can be formulated as remaining useful life (RUL)- or machine state estimation. The former is a regression- and the later is a classification problem. Long short-term memory (LSTM) recurrent neural networks (RNNs) are an extension of traditional RNNs that are effective at interpreting trends in the sensor readings and making longer term estimations. An LSTM uses a window of sequential sensor readings when making prognostic estimates which causes it to be less sensitive to local sensor variations, which results in improved prognostic model performance. In this study we create a framework to implement PdM approaches. The work consists of a codebase which can be used to create testable, comparable and repeatable prognostic modelling results and maintenance scheduling simulations. The codebase is designed to be extensible, to allow future researchers to standardise prognostic modelling results. The codebase is used to compare the prognostic modelling performance of an LSTM with tradition supervised prognostic modelling approaches such as Random Forests (RF)s, Gradient boosted (GB) trees and Support Vector Machines (SVM)s. The prognostic models are tested on three well-known prognostic datasets, the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) engine aircraft-, Center for Advanced Life Cycle Engineering (CALCE) battery- and Intelligent Maintenance Systems (IMS) bearing datasets. During the study we highlight factors that influence prognostic model performance, such as the effect of de-noising sensor readings and the size of the sample window used by the LSTM when making estimations. The results of the prognostic models are compared with previous studies and the LSTM shows improved performance on considered cases. The developed prognostic models are used to perform preventative maintenance scheduling with assumed costs in two simulations. The objective is first to compare the efficacy of traditional maintenance approaches, such as a mean time between failure (MTBF) strategy, with a PdM strategy, and second to investigate the effect of using a better performing prognostic model (such as the LSTM) in a PdM strategy. The improvements are measured by the reduction in costs. Key words: Predictive maintenance; remaining useful life; machine state estimation; preventative maintenance scheduling.
Design and evaluation of a formula cache for SMT-based bounded model checking tools
(Stellenbosch : Stellenbosch University, 2018-03) Breytenbach, Jean Anré; Fischer, Bernd; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : Program verification is a computationally expensive and time-consuming process. Bounded model checking is a branch of program verification that produces FOL formulas to be checked for satisfiability by an SMT solver. These formulas encode state transitions to states where property violations will occur and the SMT solver attempts to find a list of variable assignments that would create a path to one of these states. Bounded model checking tools create these formulas by iteratively increasing an unwind bound k that dictates the number of transitions that can be present in a path, for each unwind bound k all possible paths of length k are generated. Any state containing a property violation that is generated during the unwind bound k − 1 should also be present during the unwind bound k with perhaps a longer path to reach it. This creates many of the same paths being checked during each subsequent iteration causing the SMT solver to potentially perform duplicate work. This thesis seeks to build and evaluate a formula cache in which to store parts of the formula for which the satisfiability is already known. During subsequent iterations the formula can be sliced by removing the parts that are found in the cache, providing smaller formulas for the SMT solver which should take less time to solve. Similar formula caches have already proven successful in the field of symbolic execution. Multiple techniques are described to modify the formulas to increase the likelihood of finding a match in the cache and these techniques are combined in a multitude of ways to generate a collection of caching strategies. These strategies are then evaluated against multiple data sets to find the best performing strategies and to identify the types of problems that benefit the most from caching. The results are then compared to the results of caching formulas during symbolic execution to gain insight as to how these different approaches effectively implement caching.
Detecting and quantifying resource contention in concurrent programs
(Stellenbosch : Stellenbosch University, 2016-03) Venter, Dirk Willem; Inggs, Cornelia P.; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : Parallel programs, both shared-memory and message-passing programs, typically require the sharing of resources. For example, software resources, such as shared mutual exclusion locks and hardware resources, such as caches and memory. Shared resources can only be used by one thread or process at a time. The competition for limited resources is called resource contention. The result of resource contention is delays while waiting for access to a resource and/or extra computational overhead to resolve the request for a resource. Thus, the performance of the program can be improved by identifying and reducing contention for shared resources. This study investigates the effect of individual types of contention for hardware and software resources in detail and discusses the three tools that were developed to identify and quantify the sources of contention in concurrent programs.
Explaining neural networks used for modeling credit risk
(Stellenbosch : Stellenbosch University, 2021-03) Mohamed, Zhunaid; Visser, Willem; Herbst, B. M.; Hoffman, McElory; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.
ENGLISH ABSTRACT: Calculating risk before providing loans is a common problem that credit companies face. The most common solution is credit employees manually assessing the risk of a client by reviewing their credit portfolios. This can be a slow process and is prone to human error. Recently credit companies have been adopting machine learning techniques in order to automate this process, however this has been limited to linear techniques due to interpretability being a strict requirement. Neural networks could provide significant improvements to the way credit risk is modeled, however these are still seen as black boxes. In this work we compare various techniques which claim to provide interpretability into these black boxes. We also use these techniques to provide explanations on a neural network trained on credit data that has been provided to us.
An extension of the linear regression model for improved vessel trajectory prediction utilising a priori AIS Information
(Stellenbosch : Stellenbosch University, 2022-04) Burger, Christiaan Neil; Grobler, Trienko Lups; Kleynhans, Waldo; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: As maritime activities increase globally, there is a greater dependency on technology in monitoring, control and surveillance of vessel activity. One of the most prominent systems for monitoring vessel activity is the Automatic Identification System (AIS). An increase in both vessels fitted with AIS transponders, and satellite- and terrestrial receivers has resulted in a significant increase in AIS messages received globally. This resultant rich spatial and temporal data source related to vessel activity provides analysts with the ability to perform enhanced vessel movement analytics, of which a pertinent example is the improvement of vessel location predictions. In this thesis, we propose a novel method for predicting future locations of vessels by making use of historic AIS data. The proposed method extends a Linear Regression Model (LRM), utilising historic AIS movement data in the form of a priori generated spatial maps of the course over ground (LRMAC). The LRMAC has low complexity and is programmatically easy to implement, and attains accurate prediction results. We first compare the LRM with a Discrete Kalman Filter (DKF) on linear trajectories. We then extend the LRM to form the LRMAC. The LRMAC is compared to another method in literature called the Single Point Neighbour Search (SPNS). For the use case of predicting Cargo and Tanker vessel trajectories, with a prediction horizon of up to six hours, the LRMAC has an improved execution time and performance compared to the SPNS.
The generation of longest optimal box repetition-free words
(Stellenbosch : Stellenbosch University, 2022-04) Habeck, Manfred; Grobler, Trienko; Van Zijl, Lynette; Geldenhuys, Jaco; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: This thesis focuses on a specific problem within the field of combinatorial generation, namely, the generation of box repetition-free words. A box is a word over a given alphabet, where the first symbol in the word is the same as the last symbol. For example, the word abaca is a box. A box can contain other boxes. The box abaca contains boxes aba and aca. Boxes can overlap, such as aba and aca in abaca. This work investigates the generation of the longest possible sequence of symbols, over a given alphabet, which does not contain any repeating boxes. We show that an exhaustive enumeration based on a brute force approach with backtracking is not feasible. That is, we checked if adding a symbol to a word would create a repeating box; if not, recursively add another symbol. This method will eventually find all valid words, but takes an unreasonable amount of time for larger alphabets. As a non-enumerative attempt to find individual valid words, the Monte Carlo tree search is used. The search is based on the assumption that prefixes with good past results will also give good results in the future. Based on an analysis of the properties of box repetition-free words, a new search is devised. Factors of words are mapped onto a graph, and all non-optimal edges removed. It is then shown that any Hamiltonian path on this graph will result in a longest optimal word. The results of this work show that backtracking fails to generate longest optimal words within a reasonable time for any alphabet with more than three symbols. The Monte Carlo tree search performs better than backtracking, finding optimal words for an alphabet size of four, but failing for larger alphabets. The new method outperforms both, and with a small optimization, is shown to generate longest optimal words up to an alphabet size of six.
Impendulo: A Tool for Analysing Programmer Behaviour
(Stellenbosch : Stellenbosch University, 2015-03) Jordaan, Pieter; Visser, Willem; Geldenhuys, Jaco; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : Automated submission systems for Computer Science courses are common- place today, however, these systems are typically solely focused on grading submissions and their ability to provide analysis and feedback is vastly under- utilised. This thesis seeks to address this by presenting Impendulo, a platform for analysing programmer behaviour. Impendulo allows one to study student per- formance at both a group and an individual level. This is achieved through the use of a customisable set of software analysis tools which Impendulo runs on user submissions. The results produced by the tools are parsed by Impendulo to produce rich feedback in the form of interactive charts, annotated code and detailed reports. In order to ascertain whether Impendulo's approach is viable, experimental trials were conducted with it throughout its development process. These trials consisted of a group of students using the Impendulo system while solving a set of programming problems. After each trial, the results were studied and used to determine which aspects of the system needed to be improved. At the end of Impendulo's development, all of the experiments were studied again to gain insight into the tested students' programming.
Implementation of the Cavalieri Integral
(2023-02) van Zyl, Christoff; Grobler, Trienko; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: Cavalieri Integration in R n presents a novel visualization mechanism for weighted integration and challenges the notion of strictly rectangular integration strips. It does so by concealing the integrator inside the boundary curves of the integral. This paper investigates the Cavalieri integral as a superset of Riemann-integration in R n−1 , whereby the integral is defined by a translational region in R n−1 , which uniquely defines the integrand, integrator and integration region. In R 2 , this refined translational region definition allows for the visualization of Riemann-Stieltjes integrals along with other forms of weighted integration such as the Riemann–Liouville fractional integral and convolution operator. Programmatic implementation of such visualizations and computation of integral values are also investigated and relies on knowledge relating to numeric integration, algorithmic differentiation and numeric root finding. For the R 3 case, such visualizations over polygonal regions requires a mechanism for the triangulation of a set of nested polygons and transformations which allow for the use of repeated integration to solve the integration value over the produced triangular regions using standard 1-dimensional integration routines.
Initialisation of noise-regularised neural networks
(Stellenbosch : Stellenbosch University, 2021-12) Van Biljon, Elan; Kroon, R. S. (Steve); Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.
ENGLISH ABSTRACT: Recently, proper initialisation and stochastic regularisation techniques have greatly improved the performance and ease of training of neural networks. Some research has gone into how the magnitude of the initial weights impact optimisation, while others have focused on how initialisation affects signal propagation. In terms of noise regularisation, dropout has allowed networks to train relatively quickly and reduced overfitting. Much research has gone towards understanding why dropout improves the generalisation of networks. Two major theories are (i) that it prevents neurons from becoming too dependent on the output of other neurons and (ii) that dropout leads a network to optimise a smoother loss landscape. Despite this, our theoretical understanding of the interaction between regularisation and initialisation is sparse. Thus, the aim of this work was to broaden our knowledge of how initialisation and stochastic regularisation interact and what impact this has on network training and performance. Because rectifier activation functions are widely used, we extended new network signal propagation theory to rectifier networks that may use stochastic regularisation. Our theory predicted a critical initialisation that allows for stable pre-activation variance signal propagation. However, our theory also indicated that stochastic regularisation reduces the depth to which correlation information can propagate in ReLU networks. We validated this theory and showed that it accurately predicts a boundary across which networks do not train effectively. We then extended the investigation by conducting a large-scale randomised control trial to search for initialisations in a region that conserves input signal around the critical initialisation in the hopes of finding initialisations that provide advantages to training or generalisation. We compare the critical initialisation to 10 other initialisation schemes in a trial that consisted of over 12000 networks. We found that initialisations much larger than the critical initialisation provide extremely poor performance, while network initialisations close to the critical initialisation provide similar performance. No initialisations clearly outperformed the critical initialisation. Thus, we recommend it as a safe default for practitioners.
Integrating Bayesian network structure into normalizing flows and variational autoencoders
(Stellenbosch : Stellenbosch University, 2023-03) Mouton, Jacobie; Kroon, Steve; Stellenbosch University. Faculty of Science. Dept. of Computer Science.
ENGLISH ABSTRACT: Deep generative models have become more popular in recent years due to their good scalability and representation capacity. However, these models do not typically incorporate domain knowledge. In contrast, probabilistic graphical models speci_cally constrain the dependencies between the variables of interest as informed by the domain. In this work, we therefore consider integrating probabilistic graphical models and deep generative models in order to construct models that are able to learn complex distributions, while remaining interpretable by leveraging prior knowledge about variable interactions. We specifically consider the type of domain knowledge that can be represented by Bayesian networks, and restrict our study to the deep generative frameworks of normalizing flows and variational autoencoders. Normalizing flows (NFs) are an important family of deep neural networks for modelling complex distributions as transformations of simple base distributions. Graphical _ows add further structure to NFs, allowing one to encode non-trivial variable dependencies in these distributions. Previous graphical flows have focused primarily on a single _ow direction: either the normalizing direction for density estimation, or the generative direction for inference and sampling. However, to use a single _ow to perform tasks in both directions, the model must exhibit stable and efficient flow inversion. This thesis introduces graphical residual flows (GRFs)_graphical flows based on invertible residual networks_which ensure stable invertibility by spectral normalization of its weight matrices. Experiments confirm that GRFs provide performance competitive with other graphical flows for both density estimation and inference tasks. Furthermore, our model provides stable and accurate inversion that is also more time-efficient than alternative flows with similar task performance. We therefore recommend the use of GRFs over other graphical flows when the model may be required to perform reliably in both directions. Since flows employ a bijective transformation, the dimension of the base or latent distribution must have the same dimensionality as the observed data. Variational autoencoders (VAEs) address this shortcoming by allowing practitioners to specify any number of latent variables. Initial work on VAEs assumed independent latent variables with simple prior and variational distributions. Subsequent work has explored incorporating more complex distributions and dependency structures: including NFs in the encoder network allows latent variables to entangle non-linearly, creating a richer class of distributions for the approximate posterior, and stacking layers of latent variables allows more complex priors to be specified. In this vein, this thesis also explores incorporating arbitrary dependency structures_as specified by Bayesian networks_into VAEs. This is achieved by extending both the prior and inference network with the above GRF, resulting in the structured invertible residual network (SIReN) VAE. We specifically consider GRFs, since the application of the _ow in the VAE prior necessitates stable inversion. We compare our model's performance on several datasets to models that encode no special dependency structures, and show its potential to provide a more interpretable model as well as better generalization performance in data-sparse settings. We also identify posterior collapse_where some latent dimensions become inactive and are effectively ignored by the model_as an issue with SIReN-VAE, as it is linked with the encoded structure. As such, we employ various combinations of existing approaches to alleviate this phenomenon.
Investigating fully convolutional networks for bio-image segmentation
(Stellenbosch : Stellenbosch University, 2018-03) Wiehman, Stiaan; Kroon, R. S. (Steve); De Villiers, H. A. C.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science)
ENGLISH ABSTRACT : Bio-image analysis is a useful tool for life science researchers with a wide variety of potential applications. A specific area of interest is applying semantic segmentation methods to bio-images, which is challenging due to the typically small data sets in this application area. Neural networks have shown great promise in both general image segmentation problems, as well as bio-image segmentation problems. A recently developed class of neural networks, Fully Convolutional Networks (FCNs), have shown state-of-the-art performance on various semantic segmentation tasks. This thesis provides a thorough investigation into FCN architectures and their use in the semantic segmentation of two bio-image data sets. FCNs have been shown to provide improved performance over regular convolutional neural networks (CNNs). This work starts by comparing these two classes of networks by applying a CNN and three FCNs on the Broad Institute’s Caenorhabditis elegans data set. We showed that the three FCNs performed better on the task of semantic segmentation and provide key insights into the difference in their performance. Recent FCNs can be characterized by two main design aspects: the number of pooling steps in the architecture, and the presence or absence of skip connections. In existing literature, these hyperparameters are typically used without a detailed analysis of their effects. We build on this work by investigating these design aspects and determine their contribution towards the overall performance of the network. Using the recently presented U-net architecture and the accompanying nerve cell membrane data set, this investigation revealed that: (1) increasing the depth of the network by adding additional pooling steps could improve performance up to a (hypothesized) domain-specific saturation point (assuming the inclusion of the necessary skip connections), and (2) each skip connection in the architecture appears to make a different contribution towards the behavior of the network, with some skip connections being more important than others. These findings could provide a better understanding on how to construct new FCN architectures for future applications. We complete this investigation by exploring the possibility of performing end-to-end unsupervised learning as a pre-training technique, and test the resulting models on both fully labeled bio-image data and artificially created partially labeled bio-image data. We proposed a novel augmentation to FCN architectures which allows them to undergo end-to-end unsupervised pretraining. We showed that our unsupervised pre-training approach provides a significant reduction in the variance of the performance of the models. We then applied the supervised version and the pre-trained version of the U-net model on various amounts of partially labeled data, and found that the FCNs are capable of reaching competitive performance with as little as 0.2% of the original pixel labels. The results generated in this thesis provide the foundation for further research into a more sophisticated unsupervised pre-training approach. Such an approach might reduce the need for fully annotated bio-image data, consequently reducing the time and financial resources required to perform the annotations.

Browse

Browsing Masters Degrees (Computer Science) by Title

Results Per Page

Sort Options