Paving the way for the use of prediction modelling in a health care environment

Van Zyl, Ilse (2011-10)

The aim of the project is to pave the way for a research team by providing insights and identify possible pitfalls in the development of a Predictive Patient Admission Algorithm (PPAA).

Final year project, 2011

Technical Report

ENGLISH ABSTRACT: The high cost of hospitalisation is a challenge for many health insurance companies, governments and individuals alike. In 2006, studies concluded that well over $30 billion was spent on unnecessary hospitalisations in the United States of America, where unnecessary hospitalisations are those that could have been prevented through early patient diagnosis and treatment. Undoubtedly, there is room for improvement in this regard and it can be agreed that where lives are at stake, prevention is always better than cure; successful hospitalisation prediction may make hospitalisation prevention a realistic possibility. The Heritage Provider Network, a health insurance and health care provider and sponsor of the Heritage Health Prize (HHP) Competition, have come to realise the potential benefits that a hospitalisation prediction model could effect (Heritage Provider Network Health Prize, 2011). The competition is aimed at producing an effective hospitalisation prediction patient admissions algorithm (PPAA) to predict the amount of days a member will be hospitalised in the next period using health insurance claims data of the current period. The goal is to ultimately prevent the unnecessary hospitalisation of identified members in their network. If successful this could have many benefits to the wider society including fewer critical medical cases, fewer claims and consequently lower expenses for all stakeholders in the affected system. The competition serves as inspiration for this study which aims to pave the way for the research team who will be developing such a PPAA. This was accomplished by providing insights and identifying possible pitfalls in the development of a Predictive Patient Admission Algorithm (PPAA) using the Heritage Health Prize case study as a reference. Typically available hospitalisation data that serves as input for the PPAA are briefly described, together with recommendations on methods and technologies with which to extract, transform and load (ETL) data within this context. A list of contender techniques was assembled based on the given data, the algorithm’s expected input requirements and the techniques’ ability to meet these needs. The prediction modelling techniques reviewed include classification and regression trees (CART), multivariate adaptive regression splines (MARS), neural networks and ensemble methods. Techniques were compared in terms of a set of criteria needed to use the available data and give the desired outputs. Page iv The data mining technologies considered to model with the preferred technique include Statistica data miner, SPSS Clementine, SAS Enterprise Miner, Matlab, Excel with VBA and R. These technologies were also compared on how well they can model available data with the contender techniques. The research team’s compatibility with technologies was also considered. Recommendations concerning the prediction modelling technique was using ensemble methods and the choice of technology for ETL was SQL Server and for prediction model building recommendations are Statistica, R or Matlab. Experimentation was conducted with selected CART, MARS and the Random Forests techniques in the available technologies in order to support future prediction modelling decisions of the research team. It was concluded that the included predictor variables do not have sufficient predictive power for the use of CART, MARS and Neural Networks and that Random Forests deliver more favourable results and it was recommended that this modelling should be explored further for the use of the HHP application.

AFRIKAANSE OPSOMMING: Die hoë koste van hospitalisering is 'n uitdaging vir baie mediesefondse, regerings en individue. In 2006 het studies getoon dat meer as $30 miljard bestee is aan onnodige hospitalisering in die Verenigde State van Amerika, waar onnodige hospitalisering die gevalle is wat deur vroeë diagnose en behandeling voorkom kon word. Dit kan duidelik gesien word dat daar ruimte vir verbetering is in hierdie verband. Waar lewens op die spel, is voorkoming altyd beter as behandeling en as hospitalisering suksesvol vooruitgeskat kan word, kan hospitalisering voorkoming 'n realistiese moontlikheid word. Die Heritage Health Provider Network, 'n gesondheid versekering verskaffer en gesondheidsdienste en die borg van die Heritage Health Prize (HHP) kompetisie, het besef wat die potensiële voordele is van hospitalisering vooruitgeskatting (Heritage Health Prize, 2011). Die kompetisie is gemik op die ontwerp van 'n effektiewe hospitalisering vooruitgeskattings algoritme wat kan voorspel wat die aantal dae gaan wees wat ' n lid gehospitaliseer gaan word in die volgende periode. So ‘n algoritme gaan opgestel word met behulp van gesondheid versekering eise en hospitalisering data. Die doel is om uiteindelik te verhoed dat die onnodige hospitalisering van geïdentifiseerde lede plaasvind. Indien dit suksesvol is kan lei tot minder kritiese mediese gevalle, minder eise en gevolglik laer kostes vir alle belanghebbendes in die betrokke stelsel. Die kompetisie dien as inspirasie vir hierdie studie wat daarop gemik om die weg te baan vir die navorsingspan wat die algoritme gaan verder ontwikkel. Insigte en moontlike slaggate word uitgelig in die ontwikkeling van 'n vooruitgeskattings algoritme met behulp van die Heritage Health Prize gevallestudie as 'n verwysing. In die studie word tipies beskikbare hospitalisering data, wat dien as inset vir die algoritme, kortliks beskryf, saam met aanbevelings oor die metodes en tegnologie vir die onttrek, herskep en laai (OHL) van data binne hierdie konteks. 'n Lys van die oorweegde tegnieke is saamgestel, gebaseer op die gevallestudie data, die algoritme se verwagte inset-vereistes en die tegnieke se vermoë om aan hierdie vereistes te voorsien. Die vooruitskattings tegnieke sluit in klassifikasie en regressie bome (CART), meervoudige veranderlike aanpasbare regressie latfunksies (Multivariate adaptive regression splines), neurale netwerke en kombinering metodes. Tegnieke is ook vergelyk in terme van 'n stel kriteria wat nodig is om die beskikbare data te gebruik en die verlangde uitsette te lewer. Die data-ontginning tegnologië wat oorweeg is sluit in Statistica data miner, SPSS Clementine, SAS Enterprise Miner, Matlab, Excel met VBA en R. Hierdie tegnologië is vergelyk met verwysing tot hoe goed hulle die oorweegde vooruitskatting tegnieke kan akkommodeer. Die ondersoek span se verenigbaarheid met die tegnologiё is ook in ag geneem. Aanbevelings met betrekking tot die vooruitskatting tegnieke was om gebruik te maak van die ensemble metodes, die keuse van tegnologië vir OHL is SQL server en die bou van 'n vooruitskattings model kan gedoen word in R of Matlab en Statistica kan gebruik word vir eksplorasie doeleindes. Eksperimente is uitgevoer op CART, MARS en Random Forests (‘n kombinering metode) in beskikbare tegnologiё met die doel om toekomstige besluitneming van die navorsingspan te steun met betrekking tot die modellering van die vooruitskattings algoritme. Daar was tot die gevolgtrekking gekom dat die gekose vooruitskatter veranderlikes nie effektief is met die gebruik van vooruitskattings tegnieke naamlik CART, MARS en neurale netwerke. Die eksperimente gedoen op Random Forests het meer voordelige resultate opgelewer. Dit word dus aanbeveel dat hierdie tegniek verder ondersoek word vir die gebruik in die HHP gevallestudie.

Please refer to this item in SUNScholar by using the following persistent URL:
This item appears in the following collections: