The class imbalance problem in computer vision

Crous, Willem Hendrik

The class imbalance problem in computer vision

Files

crous_class_2022.pdf(5.25 MB)

Date

2022-04

Authors

Crous, Willem Hendrik

Publisher

Stellenbosch : Stellenbosch University

Abstract

ENGLISH ABSTRACT: Class imbalance is a naturally occurring phenomenon, typically characterised as a dataset consisting of classes with varying numbers of samples. When trained on class imbalanced data, networks tend to favour frequently occurring (majority) classes over the less frequent (minority) classes. This poses chal- lenges for tasks reliant upon accurate recognition of the less frequent classes. The aim of this thesis is to investigate general methods towards addressing this problem. First we establish why a network may favour majority classes. We contend that as less frequent classes are likely to under-represent the re- quired underlying distribution for a given task, training may produce a decision boundary that transgresses the feature space of minority classes. Additionally we find that the weight norms of the classification layer in a neural network may tend towards the distribution of the training data, thus affecting the de- cision boundary. We determine that this decision boundary shift impacts both the accuracy and confidence calibration of neural networks. We investigate several approaches to shift the decision boundary. The first approach acquires additional data and increases the representation of minority classes. This is achieved through either creating synthetic samples following a distribution- aware regularisation method, or utilising additional unlabelled data in a semi- supervised setting. The second approach aims to adjust the classifier weight norms by separately training the classifier and feature extractor. We find that implementing an effective regularisation method with a simple decoupled sam- pling scheme can provide considerable improvements over standard sampling methods. Furthermore we find that utilising additional unlabelled data may lead to additional gains given certain dataset characteristics are taken into consideration.
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar

Description

Thesis (MSc)--Stellenbosch University, 2022.

Keywords

Computer vision, Image classification, Class imbalance, UCTD, Data sets -- Characteristic

URI

http://hdl.handle.net/10019.1/124719

Collections

Masters Degrees (Applied Mathematics)

Full item page