Department of Applied Mathematics
Permanent URI for this community
Browse
Browsing Department of Applied Mathematics by browse.metadata.advisor "Brink, Willie"
Now showing 1 - 20 of 22
Results Per Page
Sort Options
- ItemAccurate camera position determination by means of moiré pattern analysis(Stellenbosch : Stellenbosch University, 2015-03) Zuurmond, Gideon Joubert; Brink, Willie; Herbst, B. M.; Stellenbosch University. Faculty of Science. Department of Applied Mathematics.ENGLISH ABSTRACT : We introduce a method for determining the position of a camera with accuracy beyond that which is obtainable through conventional methods, using a single image of a specially constructed calibration object. This is achieved by analysing the moiré pattern that emerges when two high spatial frequency patterns are superimposed, such that one pattern on a plane is observed through another pattern on a second, semi-transparent parallel plane, with the geometry of both the patterns and the planes known. Such an object can be created by suspending printed glass over printed paper or by suspending printed glass over a high resolution video display such as an OLED display or LCD. We show how the camera’s coordinate along the axis perpendicular to the planes can be estimated directly from frequency analysis of the moiré pattern relative to a set of guide points in one of the planes. This method does not require any prior camera knowledge. We further show how the choice of the patterns allows, within limits, arbitrary accuracy of this coordinate estimate at the cost of a stricter limit on the span along that coordinate for which the technique is usable. This improved accuracy is illustrated in simulation. With a sufficiently accurate estimate of the camera’s full set of 3D coordinates, obtained by conventional methods, we show how phase analysis of the moiré pattern in relation to the guides allows calculation of a new estimate of position in the two axes parallel to the planes. This new estimate is shown in simulation to offer significant improvement in accuracy.
- ItemAnalysing retinal fundus images with deep learning models(Stellenbosch : Stellenbosch University, 2023-12) Ofosu Mensah, Samuel; Bah, Bubacarr; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Applied Mathematics Division.ENGLISH ABSTRACT: Convolutional neural networks (CNNs) have successfully been used to classify diabetic retinopathy but they do not provide immediate explanations for their decisions. Explainability is relevant, especially for clinicians. To make results explainable, we use a post-attention technique called gradient-weighted class activation mapping (Grad- CAM) on the penultimate layer of deep learning models to produce localisation maps on retinal fundus images after using them to classify diabetic retinopathy. Moreover, the models were initialised using pre-trained weights obtained from training models on the ImageNet dataset. The results of this are fewer training epochs and improved performance. Next, we predict cardiovascular risk factors (CVFs) using retinal fundus images. In detail, we use a multi-task learning (MTL) model since there are several CVFs. The impact of using an MTL model is the advantage of simultaneously training for and predicting several CVFs rather than doing so individually. Also, we investigate the performance of the fundus cameras used to capture the retinal fundus images. We notice a superior performance of the desktop fundus cameras to the handheld fundus camera. Finally, we propose a hybrid model that fuses convolutions and Transformer encoders. This is done to harness the benefits of convolutions and Transformer encoders. We compare the performance of the proposed model with other attention-based models and observe on-par performance.
- ItemApplications of natural language processing for low-resource languages in the healthcare domain(Stellenbosch : Stellenbosch University., 2020-03) Daniel, Jeanne Elizabeth; Brink, Willie; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics).ENGLISH ABSTRACT: Since 2014 MomConnect has provided healthcare information and emotional support in all 11 official languages of South Africa to over 2.6 million pregnant and breastfeeding women, via SMS and WhatsApp. However, the service has struggled to scale efficiently with the growing user base and increase in incoming questions, resulting in a current median response time of 20 hours. The aim of our study is to investigate the feasibility of automating the manual answering process. This study consists of two parts: i) answer selection, a form of information retrieval, and ii) natural language processing (NLP), where computers are taught to interpret human language. Our problem is unique in the NLP space, as we work with a closed-domain question-answering dataset, with questions in 11 languages, many of which are low-resource, with English template answers, unreliable language labels, code-mixing, shorthand, typos, spelling errors and inconsistencies in the answering process. The shared English template answers and code-mixing in the questions can be used as cross-lingual signals to learn cross-lingual embedding spaces. We combine these embeddings with various machine learning models to perform answer selection, and find that the Transformer architecture performs best, achieving a top-1 test accuracy of 61:75% and a top-5 test accuracy of 91:16%. It also exhibits improved performance on low-resource languages when compared to the long short-term memory (LSTM) networks investigated. Additionally, we evaluate the quality of the cross-lingual embeddings using parallel English-Zulu question pairs, obtained using Google Translate. Here we show that the Transformer model produces embeddings of parallel questions that are very close to one another, as measured using cosine distance. This indicates that the shared template answer serves as an effective cross-lingual signal, and demonstrates that our method is capable of producing high quality cross-lingual embeddings for lowresource languages like Zulu. Further, the experimental results demonstrate that automation using a top-5 recommendation system is feasible.
- ItemAutomated elephant detection and classification from aerial infrared and colour images using deep learning(Stellenbosch : Stellenbosch University, 2018-03) Marais, Jacques Charles; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Applied Mathematics)ENGLISH ABSTRACT : In this study we attempt to detect and classify elephants in aerial images using deep learning. This is not a trivial task even for a human since elephants naturally blend in with their surroundings, making it a challenging and meaningful problem to solve. Possible applications of this work extend into general animal conservation and search-and-rescue operations, with natural extension to satellite imagery as input source. We create a region proposal algorithm that relies on digital image processing techniques and morphological operations on infrared images that correspond to the RGB images. The goal is to create a fast and computationally cheap algorithm that reduces the work that needs to be done by our deep learning classification models. The algorithm reaches our accuracy goal, detecting 98% of all ground truth elephants in the dataset. The resulting regions are mapped onto the corresponding RGB images using a plane-to-plane homography along with adjustment heuristics to overcome alignment issues caused by sensor vibration. We train multiple convolutional neural network models, using various network architectures and weight initialisation techniques, including transfer learning. Two sets of models were trained, in 2015 and 2017 respectively, using different techniques, software, and hardware. The best performing model reduces the manual verification workload by 97% while missing only 1% of the elephants detected by the region proposal algorithm. We find that convolutional neural networks, as well as the advancements in deep learning, hold significant promise in detecting elephants from aerial images for real world applications
- ItemAutomatic video captioning using spatiotemporal convolutions on temporally sampled frames(Stellenbosch : Stellenbosch University., 2020-03) Nyatsanga, Simbarashe Linval; Brink, Willie; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics).ENGLISH ABSTRACT: Being able to concisely describe content in a video has tremendous potential to enable better categorisation, indexed based-search and fast content-based retrieval from large video databases. Automatic video captioning requires the simultaneous detection of local and global motion dynamics of objects, scenes and events, to summarise them into a single coherent natural language description. Given the size and complexity of video data, it is important to understand how much temporally coherent visual information is required to adequately describe the video. In order to understand the association between video frames and sentence descriptions, we carry out a systematic study to determine how the quality of generated captions changes with respect to densely or sparsely sampling video frames in the temporal dimension. We conduct a detailed literature review to better understand the background work in image and video captioning. We describe our methodology for building a video caption generator, which is based on deep neural networks called encoder-decoders. We then outline the implementation details of our video caption generator and our experimental setup. In our experimental setup, we explore the role of word embeddings for generating sensible captions with pretrained, jointly trained and finetuned embeddings. We train and evaluate our caption generator on the Microsoft Video Description (MSVD) dataset. Using the standard caption generation evaluation metrics, namely BLEU, METEOR, CIDEr and ROUGE, our experimental results show that sparsely sampling video frames with either finetuned or jointly trained embeddings, results in the best caption quality. Our results are promising in the sense that high quality videos with a large memory footprint could be categorised through a sensible description obtained through sampling a few frames. Finally, our method can be extended such that the sampling rate adapts according to the quality of the video.
- ItemThe class imbalance problem in computer vision(Stellenbosch : Stellenbosch University, 2022-04) Crous, Willem Hendrik; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Applied Mathematics)ENGLISH ABSTRACT: Class imbalance is a naturally occurring phenomenon, typically characterised as a dataset consisting of classes with varying numbers of samples. When trained on class imbalanced data, networks tend to favour frequently occurring (majority) classes over the less frequent (minority) classes. This poses chal- lenges for tasks reliant upon accurate recognition of the less frequent classes. The aim of this thesis is to investigate general methods towards addressing this problem. First we establish why a network may favour majority classes. We contend that as less frequent classes are likely to under-represent the re- quired underlying distribution for a given task, training may produce a decision boundary that transgresses the feature space of minority classes. Additionally we find that the weight norms of the classification layer in a neural network may tend towards the distribution of the training data, thus affecting the de- cision boundary. We determine that this decision boundary shift impacts both the accuracy and confidence calibration of neural networks. We investigate several approaches to shift the decision boundary. The first approach acquires additional data and increases the representation of minority classes. This is achieved through either creating synthetic samples following a distribution- aware regularisation method, or utilising additional unlabelled data in a semi- supervised setting. The second approach aims to adjust the classifier weight norms by separately training the classifier and feature extractor. We find that implementing an effective regularisation method with a simple decoupled sam- pling scheme can provide considerable improvements over standard sampling methods. Furthermore we find that utilising additional unlabelled data may lead to additional gains given certain dataset characteristics are taken into consideration.
- ItemConvolutional and fully convolutional neural networks for the detection of landmarks in tsetse fly wing images(Stellenbosch : Stellenbosch University, 2021-12) Makhubele, Mulanga; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Applied Mathematics.ENGLISH ABSTRACT: Tsetse flies are a species of bloodsucking flies in the house fly family, that are only found in Africa. They cause animal and human African trypanosomiasis (AAT and HAT), commonly referred to as nagana and sleeping sickness. Effective tsetse fly eradication requires area-wide control, which means understanding the population dynamics of the tsetse flies in an area. Among the factors that entomologists believe to be critical to this understanding, fly size and fly wing shape are considered most important. Fly size can be deduced by calculating the distance between specific landmarks on a wing. The South African Centre for Epidemiological Modelling and Analysis (SACEMA) conducts research into tsetse fly population management and have a database of wings. To use landmarks on the wings for biological deductions about the tsetse flies in the area, researchers will need to manually annotate individual images of the wings by marking the important landmarks by hand, which is slow and error-prone. The purpose of this research is to assess the feasibility of automating the process of landmark detection in tsetse fly wing images using machine learning algorithms with a limited dataset. Extensive research has been done into automatic landmark detection. Particular focus has been given to detection of human body parts but there are a number of notable cases of animal landmark detection. Convolutional neural networks (CNNs) have been used as backbone architectures for most state-of-the-art detection systems. We compare the performance of fully convolutional networks (FCNs) against conventional LeNet style CNNs for the regression task of landmark detection in a fly wing image. The FCN accepts an image input and returns a segmentation mask as output. A Gaussian function is used to convert the response coordinate pairs into heat maps, which are combined to form a segmentation mask. After model training the heat maps produced by the FCN model are converted back to coordinate pairs using a weighted average method. Three types of models were trained: a baseline artificial neural network (ANN), LeNet style CNNs and FCNs. The ANN model had a root mean square error (RMSE) of 282.62 pixels and mean absolute error (MAE) of 181.33 pixels. The best LeNet model, LeNet3 with dropout, had an RMSE of 53.58 and MAE of 41.05. The best FCN model FCN8 with batch size 32 and Adam optimization, had an RMSE of 1.12 and MAE of 0.88. All trained models were best at predicting landmark points 5, 8 and 10 and struggled to predict landmark points 1, 4 and 6. The results indicate that machine learning models can be used to automatically and accurately detect landmark points on tsetse fly wing images. Furthermore, for our limited dataset FCNs outperform conventional LeNet style CNNs.
- ItemData-driven river flow routing using deep learning: predicting flow along the lower Orange river, Southern Africa(Stellenbosch : Stellenbosch University, 2019-04) Briers, C. J.; Brink, Willie; Smit, G. J. F.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Applied Mathematics.ENGLISH ABSTRACT : The Vanderkloof Dam, located on the Orange River, is responsible for the water supply to consumers along its 1 400 km reach up to where it flows into the Atlantic Ocean. The Vaal River, which joins the Orange River approximately 200 km downstream of the dam, contributes significant volumes of water to the flow in the Orange River. These contributions are, however, not taken into account when planning for releases from the Vanderkloof Dam. In this thesis we aimed to develop an accurate and robust flow routing model of the Orange and Vaal River system to predict the effects of releases from the Vanderkloof Dam and anticipate inflows from the Vaal River. Since the factors that impact on flow rate and volume along the river are hard to quantify over long distances, a data-driven approach is followed which uses machine learning to predict the flow rate at downstream flow gauging stations based on flow rates recorded at upstream gauging stations. We restrict the model input to data that would be readily available in an operational setting, making the model practically implementable. A variety of neural network architectures, including fully-connected networks, convolutional neural networks (CNNs) and recurrent neural networks (RNNs), were investigated. It was found that fully-connected networks produce results with accuracy comparable to a simple linear regression model, but display a superior ability to predict the timing of peaks and troughs in flow rate trends. CNNs and RNNs displayed the same ability, as well as showing improvements in accuracy. The best-performing CNN model had a mean absolute percentage error (MAPE) of 14.5 % compared to 16.9 % of a linear regression model. To anticipate contributions from the Vaal River we investigated including inflows recorded at stations on the Vaal River and two of its tributaries, the Modder and Riet Rivers. Both approaches which were investigated, i.e. incorporating these inflows as part of multi-dimensional input into a CNN, and using a parallel CNN model architecture, showed promise with a MAPE of 21.6 % and 23.5 %, respectively. Although these models did not achieve a high level of accuracy, they did display the ability to anticipate contributions from the Vaal River system. It is believed that they could, with additional refinement or using appropriate safety factors, be practically applied in an operational setting. We further investigated including seasonal data as input into our models. Including the time of the year, and including evaporation data recorded at meteorological stations in the recent past, both resulted in improved MAPE accuracy (14.4 % and 14.8 %, respectively, compared to 18.4 % for a model including no seasonal data). Observations of errors staying relatively constant over time prompted us to include errors made in the recent past as input into subsequent predictions. A model trained with this additional data achieved a MAPE of 10.2 %, a significant improvement over other applied methods
- ItemEvaluating the effectiveness of neural network techniques in the forecasting of South African basic fuel prices(Stellenbosch : Stellenbosch University, 2019-04) Kingwill, Russell; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Applied Mathematics.ENGLISH ABSTRACT : South Africa has a number of fuel grades available to consumers, one of the most popular being the 95 unleaded standard. The price of this fuel is comprised of many components including transport fees, taxes and the basic fuel price. The basic fuel price is the cost in Rand of Brent crude oil used to re ne the unit of petrol fuel, and is often the most signi cant component of the fuel price as well as the most volatile. Having a reliable forecasting methodology for the basic fuel price would be a helpful planning tool for many individuals and small enterprises. The forecasting of general fuel prices has been studied in the past with various forecasting techniques ranging from machine learning to ARIMA and regression models. In this study various deep learning models, including feed forward, recurrent and convolutional neural networks are assessed for their ability to accurately forecast the basic fuel price. These models are ranked by their ability to reduce the mean absolute percentage error on a common test data set. A number of time series data sets are used as input for the models under review, which include the closing daily price of Brent crude oil and the closing daily US Dollar exchange rate. The e ect of inputting the 30 day rolling future contracts for both the closing oil price and exchange rates is also investigated. Overall it is determined that, of the models evaluated during this study, the recurrent network performs the most favourably. On the nal test set, with optimal model and input parameters, the individual observation errors range from less than 1 % to more than 10 %. The average test error of 4.57 % can be a bit misleading due to the observed range of individual errors. Hence it is not as reliable of a forecast as one would hope for. However, the model did prove to have a fairly reliable attribute to correctly forecast the direction of the basic fuel price change. It did so in about 86% of the test data set observations, and was o by only a few cents when an incorrect direction was forecast. It is concluded that neural network models can be used to some degree for the task of forecasting the South African basic fuel price. Such models are sensitive to the amount of data provided and hence future work in this area should prioritise obtaining more data and if possible incorporating additional data sources.
- ItemImage and attribute based identification of Protea species(Stellenbosch : Stellenbosch University., 2020-04) Thompson, Peter; Brink, Willie; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics).ENGLISH ABSTRACT: The flowering plant genus Protea is a dominant representative for the biodiversity of the Cape Floristic Region in South Africa, and from a conservation point of view important to monitor. The recent surge in popularity of crowd-sourced wildlife monitoring platforms presents opportunities for automatic image based identification, for improved monitoring of species. We consider the problem of identifying the Proteaspecies in a given image with additional (but optional) attributes linked to the observation, such as location, elevation and date. We collect training and test data from a crowd-sourced platform, and find that the Protea identification problem is exacerbated by considerable inter-class similarity, data scarcity, class imbalance, as well as large variations in image quality, composition and background. Our proposed solution consists of three parts. The first part incorporates a variant of multi-region attention into a pretrained convolutional neural network, to focus on the flowerhead in the image. The second part performs coarser-grained classification on subgenera (superclasses) and then rescales the output of the first part. The third part conditions a probabilistic model on the additional attributes associated with the observation. We perform an ablation study on the proposed model and its constituents, and find that all three components together outperform our baselines and all other variants quite significantly.
- ItemLink prediction in knowledge graphs using latent feature modelling and neural tensor factorisation(Stellenbosch : Stellenbosch University, 2020-12) Magangane, Luyolo; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Applied Mathematics.ENGLISH ABSTRACT: Reasoning over knowledge expressed in natural language is a problem at the forefront of artificial intelligence. Question answering is one of the core tasks of this problem, and is concerned with giving machines the capability of generating an answer given a question, by mimicking the reasoning behaviour of humans. Relational learning, in combination with information retrieval, has been explored as a framework for solving this problem. Knowledge graphs (KGs) are used to represent facts about multiple domains as entities (nodes) and relations (edges), and the resource description framework formalism, subject-predicate-object, is used to encode these facts. Link prediction then powers knowledge discovery by scoring possible relationships between entities. This thesis explores latent feature modelling using tensor factorisation as an approach to link prediction. Tensor decompositions are an attractive approach as relational domains are usually high-dimensional and sparse, a setting where factorisation methods have shown very good results. Previous approaches have focused on shallow models that can scale to large datasets, and recently deep models have been applied, specifically neural tensor factorisation models, as these models are more expressive and automatically learn the most useful latent features for entities and relations. In this work we introduce training algorithm optimisations to the neural tensor network (NTN) and HypER neural tensor factorisation models. We make use of the TensorFlow reimplementation of NTNs and apply early stopping, adaptive moment estimation and hyperparameter optimisation using random search. We see improvements in both cost and accuracy over the baseline NTN reimplementation, using standard link prediction benchmark datasets WordNet and Freebase. We then apply optimisations to the HypER model training algorithm. We begin with compensating for covariate shift caused by hypernetworks, using batch normalisation, and propose HypER+. We see similar performance to the HypER baseline on the WN18 dataset, and see significant improvement using the FB15k dataset. We extend our optimisation by initialising entity and relation embeddings using pretrained word vectors from the GloVe language model. We see marginal improvements over the baseline using the WN18RR and FB15k-237 datasets. Our results establish HypER+ as a state-of-the-art model in latent feature modelling based link prediction.
- ItemLow-resource image captioning(Stellenbosch : Stellenbosch University, 2022-12) Du Plessis, Mikkel; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Applied Mathematics.ENGLISH ABSTRACT: Image captioning combines computer vision and natural language processing, and aims to automatically generate a short natural language phrase that describes relationships between objects and context within a given image. As the field of deep learning evolves, several approaches have produced impressive models and generally follow an encoder-decoder architecture. An encoder is utilised for visual cues and a textual decoder to produce a final caption. This can create a challenging gap between visual and textual representations, and makes the training of image captioning models resource intensive. Consequently, recent image captioning models have relied on a steady increase of training set size, computing requirements and training times. This thesis explores the viability of two model architectures for the task of image captioning in a low-resource scenario. We focus specifically on models that can be trained on a single consumer-level GPU in under 5 hours, using only a few thousand images. Our first model is a conventional image captioning model with a pre-trained convolutional neural network as the encoder, followed by an attention mechanism, and an LSTM as the decoder. Our second model utilises a Transformer in the encoder and the decoder. Additionally, we propose three auxiliary techniques that aim to extract more information from images and training captions with only marginal computational overhead. Firstly, we address the typical sparseness in object and scene representation by taking advantage of top-down and bottom-up features, in order to present the decoder with richer visual information and context. Secondly, we suppress semantically unlikely caption candidates during the decoder’s beam search procedure through the inclusion of a language model. Thirdly, we enhance the expressiveness of the model by augmenting training captions with a paraphrase generator. We find that the Transformer-based architecture is superior under low-data circumstances. Through a combination of all proposed methods applied, we achieve state-of-the-art performance on the Flickr8k test set and surpass existing recurrent-based methods. To further validate the generalisability of our models, we train on small, randomly sampled subsets of the MS COCO dataset and achieve competitive test scores compared to existing models trained on the full dataset.
- ItemLow-resource neural machine translation for Southern African languages(2021-12) Nyoni, Evander EL-Tabonah; Bassett, Bruce; Brink, WillieENGLISH ABSTRACT: The majority of African languages have not fully benefited from the recent advances in machine translation due to lack of data. Motivated by this challenge we leverage and compare transfer learning, multilingual learning and zero-shot learning on three Southern Bantu languages (namely isiZulu, isiXhosa and Shona) and English. We focus primarily on the English-to-isiZulu pair, since it has the smallest number of training pairs (30000 sentences), comprising just 28% of the average size of the other corpora. We demonstrate the significant importance of language similarity on English-to-isiZulu translations by comparing transfer learning and multilingual learning on the Englishto- isiXhosa (similar) and English-to-Shona (dissimilar) tasks. We further show that multilingual learning is the best training protocol when there is sufficient data, with BLEU score gains of between 3.8 and 7.9 compared to transfer learning and zero-shot learning respectively for the English-to-isiZulu task. Our findings show that zero-shot learning is better than training a baseline model from scratch if there is not much English-toisiZulu data. Our best model improves the previous English-to-isiZulu state-of-the-art BLEU score by more than 10. Taken together, our findings highlight the potential of leveraging the inter-relations within and between South Eastern Bantu languages to improve translations in low-resource settings.
- ItemMultitask learning and data distribution search in visual relationship recognition(Stellenbosch : Stellenbosch University., 2020-03) Josias, Shane; Brink, Willie; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics).ENGLISH ABSTRACT: An image can be described by the objects within it, as well as the interactions between those objects. A pair of object labels together with an interaction label can be assembled into what is known as a visual relationship, represented as a triplet of the form (subject, predicate, object). Recognising visual relationships in a given image is a challenging task, owing to the combinatorially large number of possible relationship triplets which lead to a so-called extreme classification problem, as well as a very long tail found typically in the distribution of those possible triplets. We investigate the efficacy of four strategies that could potentially address these issues. Firstly, instead of predicting the full triplet we opt to predict each element separately. Secondly, we investigate the use of shared network parameters to perform these separate predictions in a basic multitask setting. Thirdly, we extend the multitask setting by including an online ranking loss that acts on a trio of samples (an anchor, a positive sample, and a negative sample). Semi-hard negative mining is used to select negative samples. Finally, we consider a class-selective batch construction strategy to expose the network to more of the many rare classes during mini-batch training. We view semihard negative mining and class-selective batch construction as training data distribution search, in the sense that they both attempt to carefully select training samples in order to improve model performance. In addition to the aforementioned strategies, we also introduce a means of evaluating model behaviour in visual relationship recognition. This evaluation motivates the use of semantics. Our experiments demonstrate that batch construction can improve performance on the long tail, possibly at the expense of accuracy on the small number of dominating classes. We also find that a basic multitask model neither improves nor impedes performance in any significant way, but that its smaller size may be beneficial. Moreover, multitask models trained with a ranking loss yield a decrease in performance, possibly due to limited batch sizes.
- ItemPath planning for wheeled mobile robots using an optimal control approach(Stellenbosch : Stellenbosch University, 2019-12) Matebese, Belinda Thembisa; Banda, Mapundi K.; Withey, Daniel; Brink, Willie; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics).ENGLISH ABSTRACT: The capability and practical use of wheeled mobile robots in real-world applications have resulted in them being a topic of recent interest. These systems are most prevalent because of their simple design and ease to control. In many cases, they also have an ability to move around in an environment without any human intervention. A main stream of research for wheeled mobile robots is that of planning motions of the robot under nonholonomic constraints. A typical motion planning problem is to find a feasible path in the configuration space of the mobile robot that starts at the given initial state and reaches the desired goal state while satisfying robot kinematic or dynamic constraints. A variety of methods have been used to solve various aspects of the motion planning problem. Depending on the desired quality of the solution, an optimal path is often sought. In this dissertation, optimal control is employed to obtain optimal collision-free paths for two-wheeled mobile robots and manipulators mounted on wheeled mobile platforms from an initial state to a goal state while avoiding obstacles. Obstacle avoidance is mathematically modelled using the potential field technique. The optimal control problem is then solved using an indirect method approach. This approach employs Pontryagin’s minimum principle where analytical solutions for optimality conditions are derived. Solving the optimality condition leads to two sets of differential equations that have to be solved simultaneously and whose conditions are given at different times. This set of equations is known as a two-point boundary value problem (TPBVP) and can be solved using numerical techniques. An indirect method, namely Leapfrog, is then implemented to solve the TPBVP. The Leapfrog method begins with a feasible trajectory, which is divided into smaller subdivisions where the local optimal controls are solved. The locally optimal trajectories are added and following a certain scheme of updating the number of subdivisions, the algorithm ends with the generation of an optimal trajectory along with the corresponding cost. An advantage of using the Leapfrog method is that it does not depend on the provision of good initial guesses along a path. In addition, the solution provided by the method satisfies both boundary conditions at every step. Moreover, in each iteration the paths generated are feasible and their cost decreases asymptotically. To illustrate the effectiveness of the algorithm numerically, a quadratic cost with the control objective of steering the mobile robot from an initial state to a final state while avoiding obstacles is minimized. Simulations and numerical results are presented for environments with and without obstacles. A comparison is made between the Leapfrog method and the BVP4C optimization algorithm, and also the kinodynamic-RRT algorithm. The Leapfrog method shows value for continued development as a path planning method since it initializes easily, finds kinematically feasible paths without the need of post processing and where other techniques may fail. To our knowledge the work presented here is the first application of the Leapfrog method to find optimal trajectories for motion planning on a two-wheeled mobile robot and mobile manipulator.
- ItemA probabilistic graphical model approach to solving the structure and motion problem(Stellenbosch : Stellenbosch University, 2016-03) Streicher, Simon Frederik; Brink, Willie; Du Preez, J. A.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Applied Mathematics)ENGLISH ABSTRACT: Probabilistic graphical models show great promise in resolving uncertainty within large systems by using probability theory. However, the focus is usually on problems with a discrete representation, or problems with linear dependencies. The focus of this study is on graphical models as a means to solve a nonlinear system, specifically the structure and motion problem. For a given system, our proposed solution makes use of multivariate Gaussians to model parameters as random variables, and sigma point linearisation to capture all interrelationships as covariances. This technique does not need in-depth knowledge about given nonlinearities (such as Jacobian matrices) and can therefore be used as part of a general solution. The aim of structure and motion is to generate a 3D reconstruction of a scene and camera poses, using 2D images as input. We discuss the typical feature based structure and motion pipeline along with the underlying multiview geometry, and use this theory to find relationships between variables. We test our approach by building a probabilistic graphical model for the structure and motion problem and evaluating it on different types of synthetic datasets. Furthermore, we test our approach on two real-world datasets. From this study we conclude that, for structure and motion, there is clear promise in the performance of our system, especially on small datasets. The required runtime quickly increases, and the accuracy of results decreases, as the number of feature points and camera poses increase or the noise in the inputs increase. However, we believe that further developments can improve the system to the point where it can be used as a practical and robust solution for a wide range of real-world image sets. We further conclude that this method can be a great aid in solving similar types of nonlinear problems where uncertainty needs to be dealt with, especially those without well-known solutions.
- ItemSemi-supervised learning in computer vision(Stellenbosch : Stellenbosch University, 2022-12) Louw, Christiaan; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Applied Mathematics.ENGLISH ABSTRACT: Deep learning models have proven to be successful at tasks such as image classification. A major drawback of supervised learning is the need for large labelled datasets to obtain good classification accuracy. This can be a bar rier to those in resource-constrained environments wanting to implement a classification model in a previously unexplored field. Recent advancements in unsupervised learning methods, such as contrastive learning, have made it viable to perform representation learning without labels, which when com bined with supervised learning on relatively small labelled datasets can lead to state-of-the-art performance on image classification tasks. We study this technique, called semi-supervised learning, and provide an in vestigation into three semi-supervised learning frameworks. Our work starts by discussing the implementations of the SimCLR, SimSiam and FixMatch frameworks. We compare the results of each framework on the CIFAR-10 and STL-10 datasets in label-scarce scenarios and show that: (1) all frameworks outperform a purely supervised learning baseline when the number of labels is reduced, (2) the improvement in performance of the frameworks over the su pervised baseline increases as the number of available labels is decreased and (3) in most cases, the semi-supervised learning frameworks are able to match or outperform the supervised baseline with 10% as many labels. We also investigate the performance of the SimCLR and SimSiam framework on class-imbalanced versions of the CIFAR-10 and STL-10 datasets, and find that: (1) the improvements over the supervised learning baseline is less sub stantial than in the results with fewer overall, but balanced, class labels, and (2) with basic oversampling implemented the results are significantly improved, with the semi-supervised learning frameworks benefiting the most. The results in this thesis indicate that unsupervised representation learning can indeed lower the number of labelled images required for successful image classification by a significant degree. We also show that each of the frameworks considered in this work serves this function well.
- ItemShort-term stream flow forecasting and downstream gap infilling using machine learning techniques(Stellenbosch : Stellenbosch University, 2018-03) Steyn, Melise; Smit, G. J. F.; Brink, Willie; Wilms, Josefine M.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Applied Mathematics)ENGLISH ABSTRACT : Stream flow is an important component in the hydrological cycle and plays a vital role in many hydrological applications. Accurate stream flow forecasts may be used for the study of various hydro-environmental aspects and may assist in reducing the consequences of floods. The utility of time series records for stream flow analyses is often dependent on continuous, uninterrupted observations. However, interruptions are often unavoidable and may negatively impact the sustainable management of water resources. This study proposes the application of machine learning techniques to address these hydrological challenges. The first part of this study focuses on single station short-term stream flow forecasting for river basins where historical time series data are available. Two machine learning techniques were investigated, namely support vector regression and multilayer perceptrons. Each model was trained on historical stream flow and precipitation data to forecast stream flow with a lead time of up to seven days. The Shoalhaven, Herbert and Adelaide rivers in Australia were considered for experimentation. The predictive performance of each model was determined by the Pearson correlation coefficient, the root mean squared error and the Nash-Sutcliffe efficiency, and the predictive capabilities of the models were compared to that of a physically based stream flow forecasting model currently supplied by the Australian Bureau of Meteorology. Based on the results, it was concluded that the machine learning models have the ability to overcome certain challenges faced by physically based models and the potential to be useful stream flow forecasting tools in river basin modelling. The second part of this study investigates the ability of support vector regression and multilayer perceptron models to infill incomplete stream flow records. The infilling techniques relied upon data from donor stations and rain gauges within close proximity to the station considered for infilling. A case study was conducted on a channel in the Goulburn basin in Australia. The results showed the promising role of machine learning applications for the infilling of gaps in stream flow records and indicated that data from donor stations contribute more to the success of these models compared to precipitation data.
- ItemText detection in natural images using convolutional neural networks(Stellenbosch : Stellenbosch University, 2017-03) Grond, Marco Marten; Brink, Willie; Herbst, B. M.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Applied MathematicsENGLISH ABSTRACT : In this study we attempt to solve the problem of text detection in natural images. This requires us to identify regions in a natural image that contain text. Possible applications range from assistive technology, human computer interaction and context extraction. Although humans find the task almost trivial, large variations in colour, font, size and orientation must be accounted for, and text shares many features and structures with other objects that cause complications when attempting to automate a solution. We train multiple convolutional neural networks in an attempt to solve this problem. We chose convolutional neural networks both because they have already displayed potential in the context of text recognition, and to better understand how they operate. A sliding window approach is taken, where smaller regions of a full image are classified separately before the results are combined to identify text regions in the full image. Due to an insufficient number of annotated natural training images, we create a supplementary synthetic dataset. Using the synthetic data as a starting point we train networks of different structures, after which the same networks are finetuned on smaller natural datasets. Networks first trained on the synthetic data outperform networks trained solely on the smaller natural datasets, regardless of structure complexity. This is likely due to an inability to identify relevant features from a limited number of training examples. Our experiments further show that a larger network structure is required for generalization, and that smaller datasets are prone to overfitting. We apply our best performing trained network to the task of detecting text in full images, by extracting and classifying regions in an image using a sliding window. Image pyramids are also implemented to allow for greater variance in the size of text that can be detected. We find, however, that implementing image pyramids only slightly improves the accuracy over a single image, likely due to the fact that some scale variation was already present in the network’s training set. Ultimately, we find that convolutional neural networks show promise for the task of text detection in natural images. We also find that training a network on synthetic data and finetuning it on natural data improves the overall accuracy.
- ItemThermal and colour data fusion for people detection and tracking(Stellenbosch : Stellenbosch University, 2014-04) Joubert, Pierre; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences.ENGLISH ABSTRACT: In this thesiswe approach the problem of tracking multiple people individually in a video sequence. Automatic object detection and tracking is non-trivial as humans have complex and mostly unpredictable movements, and there are sensor noise and measurement uncertainties present. We consider traditional object detection methods and decide to use thermal data for the detection step. This choice is supported by the robustness of thermal data compared to colour data in unfavourable lighting conditions and in surveillance applications. A drawback of using thermal data is that we lose colour information, since the sensor interprets the heat emission of the body rather than visible light. We incorporate a colour sensor which is used to build features for each detected object. These features are used to help determine correspondences in detected objects over time. A problem with traditional blob detection algorithms, which typically consist of background subtraction followed by connected-component labelling, is that objects can appear to split or merge, or disappear in a few frames. We decide to add ‘dummy’ blobs in an effort to counteract these problems. We refrain from making any hard decisions with respect to the blob correspondences over time, and rather let the system decide which correspondences are more probable. Furthermore, we find that the traditional Markovian approach of determining correspondences between detected blobs in the current time step and only the previous time step can lead to unwanted behaviour. We rather consider a sequence of time steps and optimize the tracking across them. We build a composite correspondence model and weigh each correspondence according to similarity (correlation) in object features. All possible tracks are determined through this model and a likelihood is calculated for each. Using the best scoring tracks we then label all the detections and use this labelling as measurement input for a tracking filter. We find that the window tracking approach shows promise even though the data we us for testing is of poor quality and noisy. The system struggles with cluttered scenes and when a lot of dummy nodes are present. Nonetheless our findings act as a proof of concept and we discuss a few future improvements that can be considered.