The class imbalance problem in computer vision

dc.contributor.advisorBrink, Willieen_ZA
dc.contributor.authorCrous, Willem Hendriken_ZA
dc.contributor.otherStellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Applied Mathematics)en_ZA
dc.descriptionThesis (MSc)--Stellenbosch University, 2022.en_ZA
dc.description.abstractENGLISH ABSTRACT: Class imbalance is a naturally occurring phenomenon, typically characterised as a dataset consisting of classes with varying numbers of samples. When trained on class imbalanced data, networks tend to favour frequently occurring (majority) classes over the less frequent (minority) classes. This poses chal- lenges for tasks reliant upon accurate recognition of the less frequent classes. The aim of this thesis is to investigate general methods towards addressing this problem. First we establish why a network may favour majority classes. We contend that as less frequent classes are likely to under-represent the re- quired underlying distribution for a given task, training may produce a decision boundary that transgresses the feature space of minority classes. Additionally we find that the weight norms of the classification layer in a neural network may tend towards the distribution of the training data, thus affecting the de- cision boundary. We determine that this decision boundary shift impacts both the accuracy and confidence calibration of neural networks. We investigate several approaches to shift the decision boundary. The first approach acquires additional data and increases the representation of minority classes. This is achieved through either creating synthetic samples following a distribution- aware regularisation method, or utilising additional unlabelled data in a semi- supervised setting. The second approach aims to adjust the classifier weight norms by separately training the classifier and feature extractor. We find that implementing an effective regularisation method with a simple decoupled sam- pling scheme can provide considerable improvements over standard sampling methods. Furthermore we find that utilising additional unlabelled data may lead to additional gains given certain dataset characteristics are taken into consideration.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Geen opsomming beskikbaaraf_ZA
dc.format.extent76 pagesen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.subjectComputer visionen_ZA
dc.subjectImage classificationen_ZA
dc.subjectClass imbalanceen_ZA
dc.subjectData sets -- Characteristicen_ZA
dc.titleThe class imbalance problem in computer visionen_ZA
dc.rights.holderStellenbosch Universityen_ZA

Files in this item


This item appears in the following Collection(s)