Feature selection for multi-label classification

Date
2020-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT : The field of multi-label learning is a popular new research focus. In the multi-label setting, a data instance can be associated simultaneously with a set of labels instead of only a single label. This dissertation reviews the subject of multi-label classification, emphasising some of the notable developments in the field. The nature of multi-label datasets typically means that these datasets are complex and dimensionality reduction might aid in the analysis of these datasets. The notion of feature selection is therefore introduced and discussed briefly in this dissertation. A new procedure for multi-label feature selection is proposed. This new procedure, relevance pattern feature selection (RPFS), utilises the methodology of the graphical technique of Multiple Correspondence Analysis (MCA) biplots to perform feature selection. An empirical evaluation of the proposed technique is performed using a benchmark multi-label dataset and synthetic multi-label datasets. For the benchmark dataset it is shown that the proposed procedure achieves results similar to the full model, while using significantly fewer features. The empirical evaluation of the procedure on the synthetic datasets shows that the results achieved by the reduced sets of features are better than those achieved with a full set of features for the majority of the methods. The proposed procedure is then compared to two established multi-label feature selection techniques using the synthetic datasets. The results again show that the proposed procedure is effective.
AFRIKAANSE OPSOMMING : Die veld van multi-etiket leerteorie is ’n gewilde nuwe navorsingsarea. In die multi-etiket omgewing kan ’n datageval gelyktydig geassosieer word met ’n stel etikette in plaas van met slegs ’n enkele etiket. Hierdie verhandeling verskaf ’n oorsig oor die onderwerp van multi-etiket klassifikasie en beklemtoon sekere noemenswaardige ontwikkelings in die veld. Die aard van multi-etiket datastelle leen homself tipies tot komplekse datasetelle waar dimensie reduksie die analise van hierdie datastelle kan vergemaklik. Die konsep van veranderlike seleksie word dus voorgestel en kortliks in hierdie verhandeling bespreek. ’n Nuwe prosedure vir multi-etiket veranderlike seleksie word voorgestel. Hierdie nuwe procedure, relevansie patroon verandelike seleksie (RPFS), maak gebruik van die metodologie van die grafiese tegniek van Meervoudige Ooreenstemmingsanalise bi-stippings om veranderlike seleksie uit te voer. ’n Empiriese evaluering van die voorgestelde tegniek is uitgevoer met behulp van ’n norm multi-etiket datastel en sintetiese multi-etiket datastelle. Vir die norm datastel word aangetoon dat die voorgestelde prosedure soortgelyke resultate lewer as die volledige model, maar met beduidend minder veranderlikes. Die empiriese evaluering van die prosedure op die sintetiese datastelle toon dat die resultate wat deur die gereduseerde stel veranderlikes gelewer word, beter is as dié wat met die volledige stel veranderlikes gelewer is, vir die meerderheid van die metodes. Die voorgestelde prosedure word dan vergelyk met twee gevestigde multi-etiket verandelike seleksie tegnieke met behulp van die sintetiese datastelle. Die resultate toon weereens dat die voorgestelde prosedure effektief is.
Description
Thesis (PhD)--Stellenbosch University, 2020.
Keywords
Multi-label classification, Correspondence analysis (Statistics), Biplots, UCTD
Citation