ITEM VIEW

Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa

dc.contributor.authorMasconi, Katya L.en_ZA
dc.contributor.authorMatsha, Tandi E.en_ZA
dc.contributor.authorErasmus, Rajiv T.en_ZA
dc.contributor.authorKengne, Andre P.en_ZA
dc.date.accessioned2017-10-10T09:59:48Z
dc.date.available2017-10-10T09:59:48Z
dc.date.issued2016
dc.identifier.citationMasconi, K. L., et al . 2016. Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa. PLoS ONE, 10(9):e0139210, doi:10.1371/journal.pone.0139210en_ZA
dc.identifier.issn1932-6203 (online)
dc.identifier.otherdoi:10.1371/journal.pone.0139210
dc.identifier.urihttp://hdl.handle.net/10019.1/102305en_ZA
dc.descriptionCITATION: Masconi, K. L., et al . 2016. Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa. PLoS ONE, 10(9):e0139210, doi:10.1371/journal.pone.0139210.en_ZA
dc.descriptionThe original publication is available at http://journals.plos.org/plosoneen_ZA
dc.description.abstractBackground: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation. Methods: Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment. Results: The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods. Conclusions: Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.en_ZA
dc.description.urihttp://journals.plos.org/plosone/article?id=10.1371/journal.pone.0139210en_ZA
dc.format.extent12 pages : illustrations (chiefly colour)en_ZA
dc.language.isoen_ZAen_ZA
dc.publisherPublic Library of Scienceen_ZA
dc.subjectDiabetes -- Risk factorsen_ZA
dc.subjectDiabetes -- Prediction modelsen_ZA
dc.subjectDiabetes -- South Africa -- Western Capeen_ZA
dc.subjectDiabetes -- Racially mixed peopleen_ZA
dc.titleEffects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africaen_ZA
dc.typeArticleen_ZA
dc.description.versionPublisher's versionen_ZA
dc.rights.holderAuthors retain copyrighten_ZA


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

ITEM VIEW