- ItemIntegrating Bayesian network structure into normalizing flows and variational autoencoders(Stellenbosch : Stellenbosch University, 2023-03) Mouton, Jacobie; Kroon, Steve; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: Deep generative models have become more popular in recent years due to their good scalability and representation capacity. However, these models do not typically incorporate domain knowledge. In contrast, probabilistic graphical models speci_cally constrain the dependencies between the variables of interest as informed by the domain. In this work, we therefore consider integrating probabilistic graphical models and deep generative models in order to construct models that are able to learn complex distributions, while remaining interpretable by leveraging prior knowledge about variable interactions. We specifically consider the type of domain knowledge that can be represented by Bayesian networks, and restrict our study to the deep generative frameworks of normalizing flows and variational autoencoders. Normalizing flows (NFs) are an important family of deep neural networks for modelling complex distributions as transformations of simple base distributions. Graphical _ows add further structure to NFs, allowing one to encode non-trivial variable dependencies in these distributions. Previous graphical flows have focused primarily on a single _ow direction: either the normalizing direction for density estimation, or the generative direction for inference and sampling. However, to use a single _ow to perform tasks in both directions, the model must exhibit stable and efficient flow inversion. This thesis introduces graphical residual flows (GRFs)_graphical flows based on invertible residual networks_which ensure stable invertibility by spectral normalization of its weight matrices. Experiments confirm that GRFs provide performance competitive with other graphical flows for both density estimation and inference tasks. Furthermore, our model provides stable and accurate inversion that is also more time-efficient than alternative flows with similar task performance. We therefore recommend the use of GRFs over other graphical flows when the model may be required to perform reliably in both directions. Since flows employ a bijective transformation, the dimension of the base or latent distribution must have the same dimensionality as the observed data. Variational autoencoders (VAEs) address this shortcoming by allowing practitioners to specify any number of latent variables. Initial work on VAEs assumed independent latent variables with simple prior and variational distributions. Subsequent work has explored incorporating more complex distributions and dependency structures: including NFs in the encoder network allows latent variables to entangle non-linearly, creating a richer class of distributions for the approximate posterior, and stacking layers of latent variables allows more complex priors to be specified. In this vein, this thesis also explores incorporating arbitrary dependency structures_as specified by Bayesian networks_into VAEs. This is achieved by extending both the prior and inference network with the above GRF, resulting in the structured invertible residual network (SIReN) VAE. We specifically consider GRFs, since the application of the _ow in the VAE prior necessitates stable inversion. We compare our model's performance on several datasets to models that encode no special dependency structures, and show its potential to provide a more interpretable model as well as better generalization performance in data-sparse settings. We also identify posterior collapse_where some latent dimensions become inactive and are effectively ignored by the model_as an issue with SIReN-VAE, as it is linked with the encoded structure. As such, we employ various combinations of existing approaches to alleviate this phenomenon.
- ItemUsing transformers to assign ICD codes to medical notes(Stellenbosch : Stellenbosch University, 2023-03) Dreyer, Andrei Michael; Van der Merwe, Brink; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: International Classification of Disease (ICD) coding plays a significant role in classifying morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this research, we look at transformer-based architectures for predicting ICD codes. Firstly, we expand the size of an XLNet model with label-wise attention to determine whether an increase in model size leads to a better performing model. We also look at using two transformer-based architectures that are specifically designed to handle long input sequences and compare the results from these architectures based on our best-performing XLNet model. Lastly, we look at the use of different attention mechanisms with our XLNet model to determine which attention mechanism works the best. We found the following three things: an increase in model size does lead to better results, XLNet performs better than the architectures designed for longer sequence lengths, and the label-wise attention used by our XLNet model performs better than the other attention mechanisms.
- ItemSolidifying what is known about calibration artefacts and the development of an educational tool to assist in the teaching of interferometric imaging(Stellenbosch : Stellenbosch University, 2023-03) Jackson, Jason Peter; Grobler, Trienko; Ludick, Danie; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: Radio interferometers are arrays of radio antennas that work together to capture celestial radio emission. Imaging involves transforming the raw measurements made by these so-called interferometers into images of the radio sky. The first contribution of this thesis is the creation of an educational tool that utilizes the Transient Array Radio Telescope (TART). This tool can be used to teach radio interferometric imaging to undergraduate and postgraduate students. Calibration is the act of trying to correct for the effects that may have interfered with the celestial radio emission that an interferometer receives. Calibration artefacts or systematics are inadvertently created when we calibrate our instrument. Calibrating with an incomplete sky model in particular can create artefacts called ghosts. Ghosts are spurious sources that do not truly exist. A second contribution of the thesis is the creation of a scientific tool with which calibration artefacts can be studied. This tool is then used to investigate what artefacts form when a single extended source is only partially modelled (with a point source model). The results of this study show that for the aforementioned extended use-case ghosts become extended sources themselves. They also alter the original extended source in various ways. The original source takes on the same flux scale as the source in the calibration model and its profile changes; it becomes more point-like. The shorter baselines are also more severely affected than the longer baselines are and in contrast to previous studies for this particular setup the number of antennas does not impact the severity of the artefacts which are created.
- ItemImplementation of the Cavalieri Integral(2023-02) van Zyl, Christoff; Grobler, Trienko; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: Cavalieri Integration in R n presents a novel visualization mechanism for weighted integration and challenges the notion of strictly rectangular integration strips. It does so by concealing the integrator inside the boundary curves of the integral. This paper investigates the Cavalieri integral as a superset of Riemann-integration in R n−1 , whereby the integral is defined by a translational region in R n−1 , which uniquely defines the integrand, integrator and integration region. In R 2 , this refined translational region definition allows for the visualization of Riemann-Stieltjes integrals along with other forms of weighted integration such as the Riemann–Liouville fractional integral and convolution operator. Programmatic implementation of such visualizations and computation of integral values are also investigated and relies on knowledge relating to numeric integration, algorithmic differentiation and numeric root finding. For the R 3 case, such visualizations over polygonal regions requires a mechanism for the triangulation of a set of nested polygons and transformations which allow for the use of repeated integration to solve the integration value over the produced triangular regions using standard 1-dimensional integration routines.
- ItemRule Induction with Swarm Intelligence(Stellenbosch : Stellenbosch University, 2022-03) van Zyl, Jean-Pierre; Engelbrecht, Andries Petrus; Stellenbosch University. Faculty of Science. Dept. of Computer Science.ENGLISH ABSTRACT: Rule induction is the process by which explainable mappings are created between a set of input data instances and a set of labels for the input instances. This process can be seen as an extension of traditional classification algorithms, because rule induction algorithms perform classification b ut h ave t he addedproperty of being transparent when making inferences. Popular algorithms in existing literature tend to use antiquated approaches to induce rule sets. The existing approaches tend to be greedy in nature and do not provide a platform for algorithm expansion or improvement. This thesis investigates a new approach to rule induction using a set-based particle swarm optimisation algorithm. The investigation starts with a comprehensive review of the relevant literature, after which the novel algorithm is proposed and compared with popular rule induction algorithms. After the establishment of the capabilities and validity of the set-based particle swarm optimisation rule induction algorithm, the effect of the objective function on the algorithm is investigated. The objective function is tested with 12 existing performance evaluation metrics in order to understand how the performance of the algorithm can be improved. These 12 existing metrics are then used as inspiration for the proposal of 11 new performance evaluation metrics which are also tested as part of the objective function effect analysis. The effect o f v arying d istributions o f t he v alues o f t he t arget c lass i s also examined. This thesis also investigates the reformulation of the rule induction problem as a multi-objective optimisation problem and applies the newly developed multi-guide set-based particle swarm optimisation algorithm to the multiobjective formulation of rule induction. The performance of rule induction as a multi-objective problem is evaluated by examining how the trade-off between the defined objectives functions affects performance for different datasets. The existing metrics and newly proposed metrics tested in the single objective formulation of the rule induction problem are also tested in the multi-objective formulation.