Resampling algorithms for multi-label classification

dc.contributor.advisorSandrock, Trudieen_ZA
dc.contributor.authorKotze, Ulrichen_ZA
dc.contributor.otherStellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.en_ZA
dc.date.accessioned2022-03-04T09:03:42Z
dc.date.accessioned2022-04-29T09:29:14Z
dc.date.available2022-03-04T09:03:42Z
dc.date.available2022-04-29T09:29:14Z
dc.date.issued2022-04
dc.descriptionThesis (MCom)--Stellenbosch University, 2022.en_ZA
dc.description.abstractENGLISH SUMMARY: Multi-label classification is a member of the supervised learning family and represents a scenario where we wish to classify an observation into many of many classes. Therefore, in the classification paradigm an observation can belong to more than one class simultaneously. Imbalanced data is a common problem in the multi-label paradigm of learning. This project investigated resampling algorithms as a pre-processing mechanism to address the manifestation of imbalance in multi-label data to improve multi-label classification performance. Imbalance can manifest itself through a sparse data matrix at small global densities. Imbalance can also manifest itself through a disparity in local label density at larger global densities. The effect of resampling algorithms on multi-label performance is studied for both of these forms of imbalance. We specifically study the effect of these resampling algorithms on multi-label performance at changing levels of global density. The thesis made use of simulated data, five common multi-label classification techniques and seven of the most popular resampling algorithms. Three example-based, label-based and ranking-based evaluation metrics were used to assess the effect of the resampling algorithms on multi-label classification performance.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Multi-etiket klassifikasie is 'n voorbeeld van onder toesig leer en verteenwoordig 'n scenario waarin ons 'n waarneming in baie van baie klasse wil klassifiseer. Daarom kan 'n waarneming in 'n klassifikasieparadigma gelyktydig aan meer as een klas behoort. Ongebalanseerde data is 'n algemene probleem in die multi-etiket paradigma van leer. Hierdie tesis het hersteekproefnemingalgoritmes ondersoek as 'n voorverwerkingsmeganisme om die manifestasie van wanbalans in multi-etiket data aan te spreek om multi-etiket klassifikasieprestasie te verbeter. Wanbalans kan manifesteer deur 'n yl data matriks by klein globale digthede of deur 'n verskil in plaaslike etiketdigtheid by groter globale digthede. Die effek van hersteekproefnemingalgoritmes op multi-etiket prestasie word bestudeer vir beide hierdie vorme van wanbalans. Ons bestudeer spesifiek die effek van hierdie hersteekproefnemingalgoritmes op multi-etiket prestasie by veranderende vlakke van globale digtheid. Die studie het gebruik gemaak van gesimuleerde data, vyf algemene multi-etiket klassifikasietegnieke en sewe van die gewildste hersteekproefnemingalgoritmes. Drie voorbeeld-gebaseerde, etiket-gebaseerde en ranglys-gebaseerde evalueringsmetings is gebruik om die effek van die hersteekproefnemingalgoritmes op multi-etiket klassifikasieprestasie te bepaal.af_ZA
dc.description.versionMasters
dc.format.extent159 pages : illustrations
dc.identifier.urihttp://hdl.handle.net/10019.1/124730
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch University
dc.rights.holderStellenbosch University
dc.subjectStatistics -- Data processingen_ZA
dc.subjectAlgorithmsen_ZA
dc.subjectComputer algorithmsen_ZA
dc.subjectUCTD
dc.titleResampling algorithms for multi-label classificationen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
kotze_resampling_2022.pdf
Size:
10.37 MB
Format:
Adobe Portable Document Format
Description: