Doctoral Degrees (Statistics and Actuarial Science)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Statistics and Actuarial Science) by browse.metadata.advisor "Neethling, Ariane"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- ItemStatistical inference for inequality measures based on semi-parametric estimators(Stellenbosch : Stellenbosch University, 2011-12) Kpanzou, Tchilabalo Abozou; De Wet, Tertius; Neethling, Ariane; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: Measures of inequality, also used as measures of concentration or diversity, are very popular in economics and especially in measuring the inequality in income or wealth within a population and between populations. However, they have applications in many other fields, e.g. in ecology, linguistics, sociology, demography, epidemiology and information science. A large number of measures have been proposed to measure inequality. Examples include the Gini index, the generalized entropy, the Atkinson and the quintile share ratio measures. Inequality measures are inherently dependent on the tails of the population (underlying distribution) and therefore their estimators are typically sensitive to data from these tails (nonrobust). For example, income distributions often exhibit a long tail to the right, leading to the frequent occurrence of large values in samples. Since the usual estimators are based on the empirical distribution function, they are usually nonrobust to such large values. Furthermore, heavy-tailed distributions often occur in real life data sets, remedial action therefore needs to be taken in such cases. The remedial action can be either a trimming of the extreme data or a modification of the (traditional) estimator to make it more robust to extreme observations. In this thesis we follow the second option, modifying the traditional empirical distribution function as estimator to make it more robust. Using results from extreme value theory, we develop more reliable distribution estimators in a semi-parametric setting. These new estimators of the distribution then form the basis for more robust estimators of the measures of inequality. These estimators are developed for the four most popular classes of measures, viz. Gini, generalized entropy, Atkinson and quintile share ratio. Properties of such estimators are studied especially via simulation. Using limiting distribution theory and the bootstrap methodology, approximate confidence intervals were derived. Through the various simulation studies, the proposed estimators are compared to the standard ones in terms of mean squared error, relative impact of contamination, confidence interval length and coverage probability. In these studies the semi-parametric methods show a clear improvement over the standard ones. The theoretical properties of the quintile share ratio have not been studied much. Consequently, we also derive its influence function as well as the limiting normal distribution of its nonparametric estimator. These results have not previously been published. In order to illustrate the methods developed, we apply them to a number of real life data sets. Using such data sets, we show how the methods can be used in practice for inference. In order to choose between the candidate parametric distributions, use is made of a measure of sample representativeness from the literature. These illustrations show that the proposed methods can be used to reach satisfactory conclusions in real life problems.
- ItemStatistical inference of the multiple regression analysis of complex survey data(Stellenbosch : Stellenbosch University, 2016-12) Luus, Retha; De Wet, Tertius; Neethling, Ariane; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.ENGLISH SUMMARY : The quality of the inferences and results put forward from any statistical analysis is directly dependent on the correct method used at the analysis stage. Most survey data analyzed in practice riginate from stratified multistage cluster samples or complex samples. In developed countries the statistical analysis, for example linear modelling, of complex sampling (CS) data, otherwise known as survey-weighted least squares (SWLS) regression, has received some attention over time. In developing countries such as South Africa and the rest of Africa, SWLS regression is often confused with weighted least squares (WLS) regression or, in some extreme cases, the CS design is ignored and an ordinary least squares (OLS) model is fitted to the data. This is in contrast to what is found in the developed countries. Furthermore, especially in the developing countries, inference concerning the linear modelling of a continuous response is not as well documented as is the case for the inference of a categorical response, specifically in terms of a dichotomous response. Hence, the decision was made to research the linear modelling of a continuous response under CS with the objective of illustrating how the results could differ if the statistician ignores the complex design of the data or naively applies WLS in comparison to the correct SWLS regression. The complex sampling design leads to observations having unequal inclusion probabilities, the inverse of which is known as the design weight of an observation. Once adjusted for unit nonresponse and differential non-response, the sampling weights can have large variability that could have an adverse effect on the estimation precision. Weight trimming is cautiously recommended as a remedy for this, but could also increase the bias of an estimator which then affects the estimation precision once more. The effect of weight trimming on estimation precision is also investigated in this research. Two important parts of regression analysis are researched here, namely the evaluation of the fitted model and the inference concerning the model parameters. The model evaluation part includes the adjustment of well-known prediction error estimation methods, viz. leave-one-out cross-validation, bootstrap estimation and .632 bootstrap estimation, for application to CS data. It also considers a number of outlier detection diagnostics such as the leverages and Cook's distance. The model parameter inference includes bootstrap variance estimation as well as the construction of bootstrap confidence intervals, viz. the percentile, bootstrap-t, and BCa confidence intervals. Two simulation studies are conducted in this thesis. For the first simulation study a model was developed and then used to simulate a hierarchical population such that stratified two-stage cluster samples can be selected from this population. The second simulation study makes use of stratified two-stage cluster samples that are sampled from real-world data, i.e. the Income and Expenditure Survey of 2005/2006 conducted by Statistics South Africa. Similar conclusions are made from both simulation studies. These conclusions include that the incorrect linear model applied to CS data could lead to wrong conclusions, that weight trimming, when conducted with care, further improves estimation precision, and that linear modelling based on resampling methods such as the bootstrap, could outperform standard linear modelling methods, especially when applied to real-world data.