Doctoral Degrees (Epidemiology and Biostatistics)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Epidemiology and Biostatistics) by browse.metadata.advisor "Machekano, Rhoderick"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemThe impact of missing data on estimating HIV/AIDS prevalence and incidence in demographic sentinel survey studies(Stellenbosch : Stellenbosch University, 2022-04) Mosha, Neema Ramadhani; Machekano, Rhoderick; Young, Taryn; Todd, Jim; Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Global Health. Epidemiology and Biostatistics.ENGLISH SUMMARY: Background: Missing data is a challenge in most research, especially with observational population data such as demographic surveys. These studies often account for survey designs and clustering when estimating disease prevalence or incidence, but do not account for missing data. In other circumstances they do not explicitly state how they dealt with missing data during analysis or inappropriately handles them in practice. There are many challenges in conceptualising the pattern of missingness, its occurrence mechanism and complexity of methods for handling the problem of missing data. Ignoring the missingness of survey data can cause biased estimates and invalid conclusions. The primary aim of this PhD was to evaluate the impact of missing data on estimating HIV/AIDS prevalence in demographic sentinel surveillance studies. Methods: A systematic review of HIV studies to identify and describe methods used to analyse studies with missing data was done. A series of simulation studies to explore the precision and efficiency of the prevalence estimates using complete case analysis (CCA), multiple imputation (MI), inverse probability weighting (IPW) and double robust estimator (DR), when data are missing at random (MAR) in survey studies was done. A descriptive statistics and a complete case analysis to determine the incidence and population prevalence estimates ignoring the missingness on four different survey rounds of Magu Health Demographic Sentinel Surveillance (HDSS) was done.The surveys were conducted between 2006 and 2016, they included adults aged 15 years and above and about 50% of the population was tested for HIV in each survey. This was followed by data exploration assessing the missingness occurrence and association between missingness and other study characteristics. Finally, application of the statistical methods used in the simulations study was performed to re-estimate the prevalence of the surveys data taking into account the missingness. Results: The systematic review found 24 eligible articles from population, demographic and cross-sectional surveys that acknowledged the presence of missing data. In these studies, complete case analysis was the standard method of choice (100%) followed by multiple imputations (46%) and Heckman’s selection models (38%). A simulation study generated a hypothetical HIV survey with 32 different scenarios exploring data when an outcome is missing 20% and 55%. This simulation showed that when data are MAR, complete case analysis produces biased and inefficient estimates. Results showed that the three methods (MI, IPW and DR) were valid and efficient if the missingness or imputation models are correctly specified, but if either of the MI or IPW models are mis-specified, then the DR estimator can still be valid. Regarding to performance of the methods, provided that correct models are used, MI is more unbiased even when there is 55% of the data missing. However with 55% missingness all estimators are less reliable. In the complete case analysis, the overall population prevalence estimates for HIV decreased from 7.2% in 2006 to 6.6% in 2016. Cox models were used to determine HIV incidence rates and risk factor analysis by sex. The incidence rate was 5.5 per 1000 person - years in women compared to 4.6 per 1000 person-years in men. Residence, marital status, mobile individuals, and individuals with two or more partners were associated with the increase in incidence of HIV in bivariate analysis. The missingness OF HIV was as high as 60.3% (in the 2016 survey) and in all surveys(Sero 5 to 8) it was associated with age, sex, residence, and marital status. Further analysis using MI, IPW and DR assuming the outcome was MAR showed that the overall HIV prevalence was not significantly different from the complete case analysis in all four of the surveys. However, there were significant differences in the HIV estimates when stratified by the covariates. Looking at the confidence intervals width multiple imputations outperformed IPW and DR by producing more narrower estimates. Conclusion: Overall, this dissertation showed that despite the availability of methods to adjust for missing data, many surveys still ignore the missingness. The reporting among articles adjusted for missingness was below guideline standards. Understanding the mechanism of missingness enhances the proper application of advanced methods to account for the missingness. With data missing at random, IPW, MI, and DR can account for the missingness and produce unbiased and efficient estimates in HIV survey studies. Also, more simplified information and awareness are still needed to allow researchers to make informed choices, specifically on which method to apply and in which situation it works best for the estimates to be more reliable and representative.