A structured approach to mitigate significant risks associated with the use of machine learning models

Date
2020-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH SUMMARY: Many organisations find it challenging to analyse large and varied big data sets to extract relevant insights providing competitive advantage. Traditional modelling and statistical techniques are not able to effectively analyse large and varied big data sets. The use of machine learning models presents a potential solution. The problem is that governing bodies and senior management do not always understand machine learning, the significant risks associated with the use, development and deployment of machine learning models and the controls required to mitigate the risks. The aim of this research is to investigate machine learning, machine learning models, big data and data analytics, identify significant risks and recommend mitigating controls. A literature review provided a theoretical foundation for the research performed. The literature review focused on understanding machine learning, big data, data analytics, corporate governance, information and technology governance and the use of frameworks to facilitate effective governance. COBIT 2019 was selected as the most appropriate framework to identify and mitigate significant risks associated with machine learning models. To further facilitate the identification of significant risks, the core components of machine learning, as well as a machine learning development life cycle, were identified and described. The research found that machine learning consisted of four core components, namely tasks, data, algorithms and models, that are combined into a functional machine learning model through an iterative machine learning development life cycle. Using the understanding of the core components of machine learning and the machine learning development life cycle, COBIT 2019 was used to identify significant risks related to the use of machine learning models on a strategic and operational or technological level. Strategic level risks included inadequate governance and management practices, a lack of benefits realisation and a lack of skills to develop and deploy machine learning models. Operational or technological level significant risks included: (i) risks affecting the ability of machine learning models to achieve their objectives, such as cost and data and model-related risks, (ii) risks affecting the operational effectiveness of machine learning, such as information security risks, scalability and integration, and (iii)risks relating to the machine learning development life cycle. After the identification of significantrisks, mitigating controls were formulated to address the significant risks identified. These controls included appropriate governance and management practices, strategies and policies, controls over human skills and resources and organisational change management, data management controls, controls over the IT infrastructure, model validation controls, controls over vendors and third parties and controls over the machine learning development life cycle. To summarise the research a risk-and-control matrix was prepared to link the significant risks identified to the relevant mitigating controls.
AFRIKAANSE OPSOMMING: Baie organisasies vind dit uitdagend om groot en veelsoortige datastelle te analiseer, ten einde relevante insigte vir mededingende voordeel te ontgin. Tradisionele modellering en statistiese tegnieke is nie effektief om groot en veelsoortige datastelle te analiseer nie. Die gebruik van masjienleermodelle (machine learning models) bied 'n moontlike oplossing. Die probleem is dat bestuursliggame en senior bestuur nie altyd masjienleer, die beduidende risiko's wat met die gebruik, ontwikkeling en ontplooiing daarvan verband hou en die kontroles wat nodig is om die risiko's te verminder, verstaan nie. Die doel van hierdie navorsing is om masjienleer, groot data (big data) en data-analise te ondersoek, beduidende risiko's te identifiseer en mitigerende kontroles aan te beveel. 'n Literatuuroorsig, gefokus op die verstaan van masjienleer, groot data, data-analise, korporatiewe bestuur, inligting-en tegnologiebestuur en die gebruik van raamwerke om doeltreffende bestuur te bewerkstellig is uitgevoer om 'n teoretiese grondslag vir die navorsing te verskaf. COBIT 2019 is gekies as die mees geskikte raamwerk om beduidende masjienleer risiko's te identifiseer en te mitigeer. Om risiko identifisering verder te vergemaklik is die kernkomponente van masjienleer, sowel as a masjienleer ontwikkelingslewensiklus, geidentifiseer en gedefinieer. Hierdie navorsing het bevind dat masjienleer uit vier kernkomponente bestaan, naamlik take, data, algoritmes en modelle, geondersteun deur ‘n iteratiewe ontwikkelingslewensiklus om die komponente in ‘n werkende masjienleermodel te omskep. COBIT 2019, die kernkomponente van masjienleer en die masjienleerontwikkelingslewensiklus is daarna gebruik om beduidende strategiese en operasionele of tegnologiese risiko's te identifiseer wat verband hou met die gebruik van masjienleermodelle. Beduidende strategiese risiko’s sluit onvoldoende korporatiewe beheer en bestuurspraktyke, 'n gebrek aan voordele verwesenliking en 'n gebrek aan masjienleer vaardighede in. Beduidende operasionele en tegnologiese risiko’s sluit in: (i) risiko's wat die verwesenliking van masjienleerdoelwitte kan belemmer, soos koste en data- en model-verwante risiko's, (ii) risiko's wat die operasionele doeltreffendheid van die masjienleer belemmer, soos inligtingsekuriteitsrisiko's, skaalbaarheid en integrasie, en (iii) risiko's met betrekking tot die masjienleerontwikkelingslewensiklus. Na die identifisering van beduidende risiko's, is kontroles geformuleer om die beduidende risiko's te mitigeer. Hierdie kontroles sluit toepaslike korporatiewe beheer en bestuurspraktyke, strategiee en beleide, kontroles oor menslike vaardighede en hulpbronne en organisatoriese veranderingsbestuur, databestuursbeheer, kontroles oor die IT-infrastruktuur, modelvalidering, beheer oor derde partye en beheer oor die masjienleerontwikkelingslewensiklus in. 'n Risiko-en-beheer-matriks is voorberei om die beduidende risiko's te koppel aan die betrokke mitigerende kontroles.
Description
Thesis (MCom)--Stellenbosch University, 2020.
Keywords
Citation