Automated payment fraud detection using logistic regression and support vector machines

Thetard, Heinrich Mathias

Automated payment fraud detection using logistic regression and support vector machines

dc.contributor.advisor	Nel, J. H.	en_ZA
dc.contributor.author	Thetard, Heinrich Mathias	en_ZA
dc.contributor.other	Stellenbosch University. Faculty of Economic and Management Science. Dept. of Logistics.	en_ZA
dc.date.accessioned	2021-03-06T16:40:04Z
dc.date.accessioned	2021-04-21T14:36:56Z
dc.date.available	2021-03-06T16:40:04Z
dc.date.available	2021-04-21T14:36:56Z
dc.date.issued	2021-03
dc.description	Thesis (MComm)--Stellenbosch University, 2021.	en_ZA
dc.description.abstract	ENGLISH ABSTRACT: The financial technology sector is a fast moving environment. There are many innovations I nthe automation and efficiency spheres where human intervention is required less and processing speed is rapidly increasing. In the payments space this is evident as payments are processed faster each year with the vast majority of these transactions driven automatically. This has opened up a platform for fraudsters to operate on. The use of Machine Learning (ML) in fraud detection has grown in popularity. Two methods, logistic regression (LR) and support vector machines (SVMs), are used to identify fraud and are investigated in this thesis. LR is less complex as compared to SVMs, but SVMs have unique situations where it will outperform any other ML model [31]. Either method is assessed based on application conditions and measured based on a certain set of confusion matrix based metrics. The two methods are applied to a data set from a bank which participates in the automated payment environment. It was evident that the sample proportions selected had a major impact on the model performance especially with regards to sensitivity and specificity. This was an exercise of fraud identification where sensitivity is the most important. This may not be the case for all data sets and environments as the cost to investigate false positives may be higher than the actual cost of fraud prevented. Condition testing and post model application diagnostics were applied in this research. It was evident principle component analysis (PCA) feature selection was inferior to stepwise feature selection. The relatively poor performance of the PCA feature selection models is due to a loss of information when variables are removed when choosing the components. When considering the odds ratios for LR, there were several variables that were protective factors and others that were risk factors. These factors either increased or decreased the odds of a case being fraudulent. It was found that when a debit order (DO) was associated with an older person it was more likely to be fraudulent than when the DO was associated with a younger person. It was also found that if a DO had a value of R99 or R45 then the odds of the case being fraudulent would increase several-fold. LR models produced equivalent results to the more complex SVM models with a much better run time. From a practical point of view, this means that LR is preferred on larger data sets.	en_ZA
dc.description.abstract	AFRIKAANSE OPSOMMING: Die finansiële tegnologie sektor is ’n vinnig bewegende omgewing. Daar is baie innovasies op die gebied van outomatisering en doeltreffendheid, waar menslike ingryping minder nodig is en die spoed van verwerking vinnig toeneem. In die betalingsruimte blyk dit dat betalings elke jaar vinniger verwerk word, met die oorgrote meerderheid van die betalingstransaksies wat outomaties verwerk word. Dit het ’n platform vir bedrieërs geskep. Gevolglik neem die gewildheid van die gebruik van masjienleer (ML) in die opsporing van bedrog steeds toe.Twee metodes, logistieke regressie (LR) en ondersteunings vektormasjiene (SVMs), word gebruik om bedrog te identifiseer en word in hierdie tesis ondersoek. LR is minder kompleks in vergelyking met SVMs, maar SVMs het unieke situasies waar dit beter sal presteer as enige ander ML-model. Elk van hierdie metodes word beoordeel op grond van toepassingsvoorwaardes en die prestasie word gemeet aan die hand van ’n sekere stel maatstawwe wat op die verwarringsmatriks gebaseer is. Die twee metodes word op ’n datastel van ’n bank wat aan die outomatiese betalingsomgewing deelneem, toegepas.Dit was duidelik dat die geselekteerde steekproefverhoudings ’n groot invloed op die modelprestasie, sensitiwiteit en spesifisiteit gehad het. In hierdie studie is die identifikasie van bedrog die oogmerk, en daarom is die meting van sensitiwiteit die belangrikste. Dit is miskien nie die geval vir alle datastelle en omgewings nie, aangesien die koste om vals positiewe gevalle te ondersoek, hoër kan wees as wat die werklike koste van die voorkoming van bedrog is. Die toetsing van voorwaardes en ontleding van postmodel diagnostieke is in hierdie navorsing toegepas. Dit was duidelik dat hoofkomponentanalise (PCA) ondergeskik presteer het in vergelyking met stapsgewyse seleksiemetodes. Die relatief swak prestasie van die PCA seleksiemodelle is te wyte aan die verlies van inligting wanneer veranderlikes geelimineer word in die keuse van die komponente. By die oorweging van die kansverhoudings vir LR was daar verskillende veranderlikes wat beskermende faktore was en ander wat risikofaktore was. Hierdie faktore het die kans op gevalle van bedrog verhoog of verminder. Daar is gevind dat wanneer ’n debietorder (DO) met ’n ouer persoon geassosieer word, dit meer waarskynlik as bedrog geklassifiseer word as wanneer die DO met ’n jonger persoon geassosieer word. Dit is ook gevind dat as ’n DO ’n waarde van R99 en R45 het, die kans dat dit ‘n bedrogsaak sal wees, meer sal vergroot. LR-modelle lewer gelykstaande resultate aan die meer ingewikkelde SVM-modelle met ’n baie beter tydsduur. Uit ’n praktiese oogpunt beteken dit dat LR modelle verkies sal word vir groter datastelle.	af_ZA
dc.description.version	Masters
dc.format.extent	125 pages	en_ZA
dc.identifier.uri	http://hdl.handle.net/10019.1/110024
dc.language.iso	en_ZA	en_ZA
dc.publisher	Stellenbosch : Stellenbosch University	en_ZA
dc.rights.holder	Stellenbosch University	en_ZA
dc.subject	Logistic regression analysis	en_ZA
dc.subject	Machine learning	en_ZA
dc.subject	Support vector machines	en_ZA
dc.subject	SVMs (Algorithms)	en_ZA
dc.subject	Automated tellers	en_ZA
dc.subject	ATMs (Banking)	en_ZA
dc.subject	Remote sensing	en_ZA
dc.subject	Contingency tables -- Computer programs	en_ZA
dc.subject	Commercial crimes	en_ZA
dc.subject	Banks and banking -- Security measures	en_ZA
dc.subject	Bank fraud	en_ZA
dc.subject	UCTD
dc.title	Automated payment fraud detection using logistic regression and support vector machines	en_ZA
dc.type	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: thetard_payment_2021.pdf
Size:: 2.17 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Masters Degrees (Logistics)