A study of fairness in machine learning in the presence of missing values

dc.contributor.advisorSandrock, Trudyen_ZA
dc.contributor.authorBhatti, Aeysha Azizen_ZA
dc.contributor.otherStellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.en_ZA
dc.date.accessioned2023-06-29T08:26:15Z
dc.date.accessioned2023-03-01T08:54:25Z
dc.date.available2023-06-29T08:26:15Z
dc.date.available2023-03-01T08:54:25Z
dc.date.issued2023-03
dc.descriptionThesis (MCom)--Stellenbosch University, 2023.en_ZA
dc.description.abstractENGLISH SUMMARY: Fairness of Machine Learning algorithms is a topic that is receiving increasing attention, as more and more algorithms permeate the day to day aspects of our lives. One way in which bias can manifest in a data source is through missing values. If data are missing, these data are often assumed to be missing completely randomly, but usually this is not the case. In reality, the propensity of data being missing is often tied to socio-economic status or demographic characteristics of individuals. There is very limited research into how missing values and missing value handling methods can impact the fairness of an algorithm. In this research, we conduct a systematic study starting from the foundational questions of how the data are missing, how the missing data are dealt with and how this impacts fairness, based on the outcome of a few different types of machine learning algorithms. Most researchers, when dealing with missing data, either apply listwise deletion or tend to use the simpler methods of imputation versus the more complex ones. We study the impact of these simpler methods on the fairness of algorithms. Our results show that the missing data mechanism and missing data handling procedure can impact the fairness of an algorithm, and that under certain conditions the simpler imputation methods can sometimes be beneficial in decreasing discrimination. en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Die regverdigheid van masjienleeralgoritmes is ’n onderwerp wat toenemend aandag geniet, soos al hoe meer algoritmes elke aspek van ons alledaagse lewens deurdring. Een manier waarop sydigheid in ’n databron kan manifesteer is deur ontbrekende waardes. Indien daar ontbrekende data is, word daar dikwels aanvaar dat die data op ’n algeheel ewekansige manier ontbrekend is, maar dit is gewoonlik nie die geval nie. In werklikheid is die geneigdheid vir die afwesigheid van data dikwels verwant aan sosio-ekonomiese status of demografiese eienskappe van individue. Daar is baie beperkte navorsing oor hoe ontbrekende waardes en die hantering daarvan die regverdigheid van algoritmes kan beinvloed. In hierdie navorsing voer ons ’n sistematiese studie uit, met die basiese vrae as beginpunt, soos op watter manier die data ontbrekend is, hoe die ontbrekende waardes hanteer word en hoe dit regverdigheid beinvloed, gebaseer op die uitkoms van ’n paar verskillende masjienleeralgoritmes. Meeste navorsers gebruik skrappingsmetodes of eenvoudige imputasiemetodes eerder as meer komplekse metodes wanneer hulle met ontbrekende waardes gekonfronteer word. Ons ondersoek die impak van hierdie eenvoudiger metodes op die regverdigheid van algoritmes. Ons resultate toon dat die onderliggende ontbrekende waarde meganisme en die prosedure vir die hantering van ontbrekende waardes die regverdigheid van ’n algoritme kan beinvloed, en dat onder sekere kondisies die eenvoudiger imputasiemetodes soms kan help om diskriminasie te verminder.af_ZA
dc.description.versionMasters
dc.embargo.terms2023-09-01
dc.format.extentxi, 125 pages : illustrations, includes annexures
dc.identifier.urihttps://scholar.sun.ac.za/handle/10019.1/127442
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch University
dc.rights.holderStellenbosch University
dc.subject.lcshMachine learning – Algorithmsen_ZA
dc.subject.lcshMachine learning -- Mathematical modelsen_ZA
dc.subject.lcshNeural networks (Computer science)en_ZA
dc.subject.nameUCTD
dc.titleA study of fairness in machine learning in the presence of missing valuesen_ZA
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
bhatti_fairness_2023.pdf
Size:
3.7 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: