Application of statistics and machine learning in healthcare

Date
2019-04
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH SUMMARY : Clinical performance and cost efficiency are key focus areas in the healthcare industry, since providing quality and affordable healthcare is a continuing challenge. The goal of this research is to use statistical analyses and modelling to improve efficiency in healthcare by focussing on readmissions. Patients readmitted to hospital can indicate poor clinical care and have immense cost implications. It is advantageous if readmissions can be kept to a minimum. Generally, stakeholders view strategies to address the clinical performance of healthcare providers, such as readmission rate, as mainly clinical in nature. However, this study will investigate the potential role of machine learning in the improvement of clinical outcomes. This study defines machine learning as the identification of complex patterns (linear or non – linear) present in observed data, with the goal of predicting a certain outcome for new cases by mimicking the true underlying pattern in the population which led to the observed outcomes in the sample while throughout limiting rigid structural assumptions. The question at hand is whether patients that are at risk of readmission can be identified, along with the risk factors that can be associated with an increase in the likelihood of the event of readmission occurring. If yes, this can provide an opportunity to reduce the number of readmissions and thus avoid the resulting cost and clinical consequences. Once identified as a patient at risk for readmission, it will provide an opportunity for early clinical intervention. In addition, the model will provide the opportunity to calculate risk scores for patients, which in turn will enable risk adjustment of the readmissions rates reported. The data under consideration in this study is healthcare data generated by the operations of an international healthcare provider, Mediclinic International. The data that the research is based on is patient data captured on hospital level in all Mediclinic hospitals, operational in Mediclinic International’s Southern African platform. Several statistical algorithms exist to model the responses of interest. The techniques consist of simple, well known techniques, as well as techniques that are more advanced. Logistic regression and decision trees are examples of simple techniques, while neural networks and support vector machines (SVM) are more complex. SAS Enterprise Guide is the software of choice for the data preparation, while SAS Enterprise Miner is the software used for the machine learning component of this study. The study aims to provide insight into machine learning techniques, as well as construct machine learning models that produce reasonable accuracy in terms of prediction of readmissions.
AFRIKAANSE OPSOMMING : In die privaat gesondsheidsorg industrie word daar klem gelê op meting van kliniese prestasie en koste doeltreffendheid, weens die feit dat die lewering van kwaliteit en bekostigbare gesondheidssorg ΄n voortslepende uitdaging is. Die doel van hierdie studie is om statistiese analises te beskou wat die potensiaal het om ΄n bydrae te lewer tot die taak om doeltreffendheid in gesondheidsorg te verbeter. Die studie beskou hoofsaaklik hertoelatings weens die belangrikheid van hertoelatings as ΄n maatstaf van die kwaliteit van gesondheidssorg asook as gevolg van die onmeentlike finansiële gevolge wat hertoelatings teweeg bring. Die voordele verbonde aan die vermindering van die aantal hertoelatings, is merkwaardig. Oor die algemeen beskou belanghebbendes die strategieë om kliniese prestasie te verbeter as medies van aard. Alternatiewelik ondersoek hierdie studie die moontlike rol wat statistiese leër teorie, oftewel, statistiese algoritmes kan speel in die taak om kliniese effektiwiteit en prestasie te verbeter. Statistiese leër teorie kan beskryf word as die identifikasie van komplekse patrone in waargenome data met die oog op die voorspelling van ΄n uitkoms van belang deur die onderliggende patroon wat die waargenome data teweeg gebring het na te boots en deurentyd rigiede strukturele aannames t.o.v die model struktuur te vermy. Die vraag wat navore kom is of hertoelatings, tesame met die faktore wat ΄n noemenswaardige bydrae lewer tot die manifestasie van ΄n hertoelating, geïdentifiseer kan word. Indien wel, sal dit kliniese werkers kan bystaan in die taak om hertoelatings te verhoed en sodadig die kliniese prestasie van hospitale te verbeter. Die oomblik wat die statistiese model die pasiënt as ΄n risiko geval identifiseer, sal dit kliniese werkers die geleentheid gee om vroegtydig op te tree om sodoende die voorkoming van ΄n hertoelating te bewerkstellig. Asook, die statistiese model sal waarskynlikhede verskaf wat gebruik kan word om die hertoelatingskoers van hospitale aan te pas vir die graad van risiko wat ervaar is. Die data wat beskou word in hierdie studie is pasiënt data wat ingesleutel word gedurende ΄n besoek aan ΄n hospitaal. Die privaat gesondheidsorg maatskappy betrokke is Mediclinic Internasionaal. Die betrokke data word gegenereer in die Suidelike Afrika platform van Mediclinic Internasionaal. Daar bestaan verskeie statistiese algoritmes en modelle wat die uitkoms van hertoelatings kan modelleer. Sommige tegnieke is goed bekend, byvoorbeeld besluitnemingsbome, terwyl ander tegnieke soos neurale netwerke minder alledaags is. Logistiese regressie is nog ΄n voorbeeld van ΄n bekende tegniek. Ondersteunings vektor masjiene is minder bekend en ook meer kompleks. SAS Enterprise Guide is die gekose sagteware vir die data voorbereiding in hierdie studie, terwyl SAS Enterprise Miner sagteware is wat gebruik word vir die modellering. Die oogmerk van hierdie studie is, eerstens, om lig te werp op statistiese leër terorie tesame met die statistiese tegnieke wat daarmee gepaard gaan. Tweedens is die studie ten doel om statistiese modellering te gebruik om hertoelatings met bevredigende akkuraatheid te voorspel.
Description
Thesis (MCom)--Stellenbosch University, 2019.
Keywords
Machine learning, Logistic regression analysis, Support vector machines, Neural networks (Computer science), Decision trees, Hospitals -- Admission and discharge -- Data mining, SAS (Computer file), UCTD
Citation