Doctoral Degrees (Statistics and Actuarial Science)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Statistics and Actuarial Science) by Author "Luus, Retha"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemStatistical inference of the multiple regression analysis of complex survey data(Stellenbosch : Stellenbosch University, 2016-12) Luus, Retha; De Wet, Tertius; Neethling, Ariane; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.ENGLISH SUMMARY : The quality of the inferences and results put forward from any statistical analysis is directly dependent on the correct method used at the analysis stage. Most survey data analyzed in practice riginate from stratified multistage cluster samples or complex samples. In developed countries the statistical analysis, for example linear modelling, of complex sampling (CS) data, otherwise known as survey-weighted least squares (SWLS) regression, has received some attention over time. In developing countries such as South Africa and the rest of Africa, SWLS regression is often confused with weighted least squares (WLS) regression or, in some extreme cases, the CS design is ignored and an ordinary least squares (OLS) model is fitted to the data. This is in contrast to what is found in the developed countries. Furthermore, especially in the developing countries, inference concerning the linear modelling of a continuous response is not as well documented as is the case for the inference of a categorical response, specifically in terms of a dichotomous response. Hence, the decision was made to research the linear modelling of a continuous response under CS with the objective of illustrating how the results could differ if the statistician ignores the complex design of the data or naively applies WLS in comparison to the correct SWLS regression. The complex sampling design leads to observations having unequal inclusion probabilities, the inverse of which is known as the design weight of an observation. Once adjusted for unit nonresponse and differential non-response, the sampling weights can have large variability that could have an adverse effect on the estimation precision. Weight trimming is cautiously recommended as a remedy for this, but could also increase the bias of an estimator which then affects the estimation precision once more. The effect of weight trimming on estimation precision is also investigated in this research. Two important parts of regression analysis are researched here, namely the evaluation of the fitted model and the inference concerning the model parameters. The model evaluation part includes the adjustment of well-known prediction error estimation methods, viz. leave-one-out cross-validation, bootstrap estimation and .632 bootstrap estimation, for application to CS data. It also considers a number of outlier detection diagnostics such as the leverages and Cook's distance. The model parameter inference includes bootstrap variance estimation as well as the construction of bootstrap confidence intervals, viz. the percentile, bootstrap-t, and BCa confidence intervals. Two simulation studies are conducted in this thesis. For the first simulation study a model was developed and then used to simulate a hierarchical population such that stratified two-stage cluster samples can be selected from this population. The second simulation study makes use of stratified two-stage cluster samples that are sampled from real-world data, i.e. the Income and Expenditure Survey of 2005/2006 conducted by Statistics South Africa. Similar conclusions are made from both simulation studies. These conclusions include that the incorrect linear model applied to CS data could lead to wrong conclusions, that weight trimming, when conducted with care, further improves estimation precision, and that linear modelling based on resampling methods such as the bootstrap, could outperform standard linear modelling methods, especially when applied to real-world data.