The class imbalance problem in computer vision
Date
2022-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Class imbalance is a naturally occurring phenomenon, typically characterised
as a dataset consisting of classes with varying numbers of samples. When
trained on class imbalanced data, networks tend to favour frequently occurring
(majority) classes over the less frequent (minority) classes. This poses chal-
lenges for tasks reliant upon accurate recognition of the less frequent classes.
The aim of this thesis is to investigate general methods towards addressing
this problem. First we establish why a network may favour majority classes.
We contend that as less frequent classes are likely to under-represent the re-
quired underlying distribution for a given task, training may produce a decision
boundary that transgresses the feature space of minority classes. Additionally
we find that the weight norms of the classification layer in a neural network
may tend towards the distribution of the training data, thus affecting the de-
cision boundary. We determine that this decision boundary shift impacts both
the accuracy and confidence calibration of neural networks. We investigate
several approaches to shift the decision boundary. The first approach acquires
additional data and increases the representation of minority classes. This is
achieved through either creating synthetic samples following a distribution-
aware regularisation method, or utilising additional unlabelled data in a semi-
supervised setting. The second approach aims to adjust the classifier weight
norms by separately training the classifier and feature extractor. We find that
implementing an effective regularisation method with a simple decoupled sam-
pling scheme can provide considerable improvements over standard sampling
methods. Furthermore we find that utilising additional unlabelled data may
lead to additional gains given certain dataset characteristics are taken into
consideration.
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar
Description
Thesis (MSc)--Stellenbosch University, 2022.
Keywords
Computer vision, Image classification, Class imbalance, UCTD, Data sets -- Characteristic