Markov modelling of disease progression in the presence of missing covariates

Kotze, Loamie (2019-04)

Thesis (MCom)--Stellenbosch University, 2019.

Thesis

ENGLISH SUMMARY : Breast cancer is a very prevalent cancer amongst women. The stages of breast cancer are influenced by characteristics such as age, hormone receptor statuses, HER2 status and staging information (TNM staging). This study aims to model the progression of breast cancer using a multi-state model which evaluates three pre-defined stages of the disease. A secondary aim is to determine an appropriate technique to impute missing data in the covariates. The disease progression can be modelled by using multi-state models and it is of interest to analyse the effect of different risk factors on the transitions between the states. The variable of interest can be seen as the state of the individual at that time point. The transition intensities of the multi-state model provides the hazards of moving from one state to another and can be used to calculate the mean sojourn time in any given state. A combination of claims data and authorisation treatment request data were obtained from Isimo Health for 393 breast cancer patients. Based on this, a dataset was simulated using the TPmsm package in R statistical programming. The simulated data were used to test two imputation techniques, one based on chained equations and one based on random forests, for the missing data present in the covariates. The latter technique performed the best based on several performance measures, and was used to impute the dataset from Isimo Health. Thereafter, a multi-state Markov model was fitted to the imputed data with three pre-defined states including curative (receive treatment with the intent to cure), non-curative (receive treatment with the intent to provide improved survival or symptom control) and death. It was observed that the Markov assumption does not hold and, therefore a semi-Markov model was fitted to the data. The findings showed that only one of the covariates, namely staging, had a significant effect on the transition probabilities. This is only the case for the transition between the non-curative and death state. Covariates as a whole, did have a significant effect on the transitions from curative to non-curative and non-curative to death. However, there was no significant effect on the transition from curative to death. It can be concluded, based on statistical measures, that the missForest package efficiently imputes missing covariates before modelling disease progression with multi-state models using the p3state.msm package.

AFRIKAANSE OPSOMMING : Borskanker is ’n hoogs prevalente kanker onder vrouens. Die graad van borskanker word beïnvloed deur eienskappe soos hormoon reseptor statusse, HER2 status en die graad van die kanker (TNMgradering). Die studie beoog om die progressie van borskanker te modelleer deur gebruik te maak van ’n multi-staat model met drie voorafgedefinieerde state. Dit word ook verlang om ’n geskikte tegniek te verkry om ontbrekende data van die kovariate te verkry. Multi-staat modelle word gebruik om die progressie van die borskanker te modelleer en dit is wenslik om die effek van verskillende risiko faktore op die oorgangsintensiteite tussen state te analiseer. Die veranderlike van belang kan gesien word as die staat waarin die individu op daardie oomblik bevind is. Die oorgangsintensiteite van multi-staat modelle verskaf die gevaarkoers om van een staat na die volgende te beweeg. Die oorgangsintensiteite kan ook gebruik word om die gemiddelde verblyftyd in enige gegewe staat te bereken. ’n Kombinasie van eise-data en magtigingsbehandeling versoek-data was verkry vanaf Isimo Health vir 393 borskanker pasiënte. Die TPmsm pakket in R was gebruik om ’n datastel te simuleer gebasseer op die Isimo Health data. Die gesimuleerde data was gebruik om verskillende imputeringstegnieke te toets om die ontbreekte data in die kovariate in te vul. Die imputeringstegniek gebaseer op Random Forests het die beste gevaar en was dus gebruik om die Isimo Health datastel te imputeer. Die missForest pakket in R was gebruik om die imputering te doen. Na die imputering, is ’n multi-staat Markov model gepas met drie voorafgedefinieerde state naamlik genesend (ontvang behandeling met die doel om te genees), nie-genesend (ontvang behandeling met die doel om oorlewing te verbeter of simptoombeheer) en afsterwing. Die Markov aanname geld nie en dus word ’n semi-Markov model aan die data gepas. Die bevindings wys dat die graad van die kanker die enigste kovariaat is wat ’n statisties betekenisvolle effek op die oorgangswaarskynlikhede het. Dit is slegs die geval vir die oorgang tussen die nie-genesende en afsterwing staat. Die kovariate in geheel het ’n statisties betekenisvolle effek op die oorgangswaarskynlikhede van genesend na nie-genesend en nie-genesend na afsterwing. Dit het nie ’n statisties betekenisvolle effek op die oorgang van genesend na afsterwing nie. Die missForest pakket is die mees geskikte pakket om kovariate met ontbrekende waardes te imputeer. Hierdie gevolgtrekking is gebaseer op verskillende statistiese maatstawwe. Daarna kan die p3state.msm pakket gebruik word om die progressie van borskanker te modelleer.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/106018
This item appears in the following collections: