Feature selection for multi-label classification

dc.contributor.advisorSteel, S. J.en_ZA
dc.contributor.authorContardo-Berning, Ivona E.en_ZA
dc.contributor.otherStellenbosch University. Faculty of Economic and Management Sciences. Dept. of Economics.en_ZA
dc.date.accessioned2020-11-27T06:23:42Z
dc.date.accessioned2021-01-31T19:41:05Z
dc.date.available2020-11-27T06:23:42Z
dc.date.available2021-01-31T19:41:05Z
dc.date.issued2020-12
dc.descriptionThesis (PhD)--Stellenbosch University, 2020.en_ZA
dc.description.abstractENGLISH ABSTRACT : The field of multi-label learning is a popular new research focus. In the multi-label setting, a data instance can be associated simultaneously with a set of labels instead of only a single label. This dissertation reviews the subject of multi-label classification, emphasising some of the notable developments in the field. The nature of multi-label datasets typically means that these datasets are complex and dimensionality reduction might aid in the analysis of these datasets. The notion of feature selection is therefore introduced and discussed briefly in this dissertation. A new procedure for multi-label feature selection is proposed. This new procedure, relevance pattern feature selection (RPFS), utilises the methodology of the graphical technique of Multiple Correspondence Analysis (MCA) biplots to perform feature selection. An empirical evaluation of the proposed technique is performed using a benchmark multi-label dataset and synthetic multi-label datasets. For the benchmark dataset it is shown that the proposed procedure achieves results similar to the full model, while using significantly fewer features. The empirical evaluation of the procedure on the synthetic datasets shows that the results achieved by the reduced sets of features are better than those achieved with a full set of features for the majority of the methods. The proposed procedure is then compared to two established multi-label feature selection techniques using the synthetic datasets. The results again show that the proposed procedure is effective.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING : Die veld van multi-etiket leerteorie is ’n gewilde nuwe navorsingsarea. In die multi-etiket omgewing kan ’n datageval gelyktydig geassosieer word met ’n stel etikette in plaas van met slegs ’n enkele etiket. Hierdie verhandeling verskaf ’n oorsig oor die onderwerp van multi-etiket klassifikasie en beklemtoon sekere noemenswaardige ontwikkelings in die veld. Die aard van multi-etiket datastelle leen homself tipies tot komplekse datasetelle waar dimensie reduksie die analise van hierdie datastelle kan vergemaklik. Die konsep van veranderlike seleksie word dus voorgestel en kortliks in hierdie verhandeling bespreek. ’n Nuwe prosedure vir multi-etiket veranderlike seleksie word voorgestel. Hierdie nuwe procedure, relevansie patroon verandelike seleksie (RPFS), maak gebruik van die metodologie van die grafiese tegniek van Meervoudige Ooreenstemmingsanalise bi-stippings om veranderlike seleksie uit te voer. ’n Empiriese evaluering van die voorgestelde tegniek is uitgevoer met behulp van ’n norm multi-etiket datastel en sintetiese multi-etiket datastelle. Vir die norm datastel word aangetoon dat die voorgestelde prosedure soortgelyke resultate lewer as die volledige model, maar met beduidend minder veranderlikes. Die empiriese evaluering van die prosedure op die sintetiese datastelle toon dat die resultate wat deur die gereduseerde stel veranderlikes gelewer word, beter is as dié wat met die volledige stel veranderlikes gelewer is, vir die meerderheid van die metodes. Die voorgestelde prosedure word dan vergelyk met twee gevestigde multi-etiket verandelike seleksie tegnieke met behulp van die sintetiese datastelle. Die resultate toon weereens dat die voorgestelde prosedure effektief is.af_ZA
dc.description.versionDoctoral
dc.format.extentxxiv, 686 pages ; illustrations, includes annexures
dc.identifier.urihttp://hdl.handle.net/10019.1/109247
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.rights.holderStellenbosch Universityen_ZA
dc.subjectMulti-label classificationen_ZA
dc.subjectCorrespondence analysis (Statistics)en_ZA
dc.subjectBiplotsen_ZA
dc.subjectUCTD
dc.titleFeature selection for multi-label classificationen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
contardoberning_feature_2020.pdf
Size:
26.42 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: