Department of Statistics and Actuarial Science
Permanent URI for this community
Browse
Browsing Department of Statistics and Actuarial Science by browse.metadata.advisor "De Wet, Tertius"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
- ItemAn analysis of income and poverty in South Africa(Stellenbosch : University of Stellenbosch, 2007-03) Malherbe, Jeanine Elizabeth; De Wet, Tertius; Viljoen, H.; Neethling, Ariane; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.The aim of this study is to assess the welfare of South Africa in terms of poverty and inequality. This is done using the Income and Expenditure Survey (IES) of 2000, released by Statistics South Africa, and reviewing the distribution of income in the country. A brief literature review of similar studies is given along with a broad de nition of poverty and inequality. A detailed description of the dataset used is given together with aspects of concern surrounding the dataset. An analysis of poverty and income inequality is made using datasets containing the continuous income variable, as well as a created grouped income variable. Results from these datasets are compared and conclusions made on the use of continuous or grouped income variables. Covariate analysis is also applied in the form of biplots. A brief overview of biplots is given and it is then used to obtain a graphical description of the data and identify any patterns. Lastly, the conclusions made in this study are put forward and some future research is mentioned.
- ItemAspects of copulas and goodness-of-fit(Stellenbosch : Stellenbosch University, 2008-12) Kpanzou, Tchilabalo Abozou; De Wet, Tertius; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.The goodness-of- t of a statistical model describes how well it ts a set of observations. Measures of goodness-of- t typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, for example to test for normality, to test whether two samples are drawn from identical distributions, or whether outcome frequencies follow a speci ed distribution. Goodness-of- t for copulas is a special case of the more general problem of testing multivariate models, but is complicated due to the di culty of specifying marginal distributions. In this thesis, the goodness-of- t test statistics for general distributions and the tests for copulas are investigated, but prior to that an understanding of copulas and their properties is developed. In fact copulas are useful tools for understanding relationships among multivariate variables, and are important tools for describing the dependence structure between random variables. Several univariate, bivariate and multivariate test statistics are investigated, the emphasis being on tests for normality. Among goodness-of- t tests for copulas, tests based on the probability integral transform, Rosenblatt's transformation, as well as some dimension reduction techniques are considered. Bootstrap procedures are also described. Simulation studies are conducted to rst compare the power of rejection of the null hypothesis of the Clayton copula by four di erent test statistics under the alternative of the Gumbel-Hougaard copula, and also to compare the power of rejection of the null hypothesis of the Gumbel-Hougaard copula under the alternative of the Clayton copula. An application of the described techniques is made to a practical data set.
- ItemAspects of model development using regression quantiles and elemental regressions(Stellenbosch : Stellenbosch University, 2007-03) Ranganai, Edmore; De Wet, Tertius; Van Vuuren, J.O.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from the classical Gaussian assumptions (outliers) as well as data aberrations in the design space. The two major data aberrations in the design space are collinearity and high leverage. Leverage points can also induce or hide collinearity in the design space. Such leverage points are referred to as collinearity influential points. As a consequence, over the years, many diagnostic tools to detect these anomalies as well as alternative procedures to counter them were developed. To counter deviations from the classical Gaussian assumptions many robust procedures have been proposed. One such class of procedures is the Koenker and Bassett (1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the linear model. RQs can be found as solutions to linear programming problems (LPs). The basic optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES) regressions, which consist of subsets of minimum size to estimate the necessary parameters of the model. On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown that many OLS statistics (estimators) are related to ES regression statistics (estimators). Therefore there is an inherent relationship amongst the three sets of procedures. The relationship between the ES procedure and the RQ one, has been noted almost “casually” in the literature while the latter has been fairly widely explored. Using these existing relationships between the ES procedure and the OLS one as well as new ones, collinearity, leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure was proposed as variable selection technique in the RQ scenario and some tentative results were given for it. These results are promising. Single case diagnostics were considered as well as their relationships to multiple case ones. In particular, multiple cases of the minimum size to estimate the necessary parameters of the model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were developed for both ESs and RQs. The main problems that affect RQs adversely are collinearity and leverage due to the nature of the computational procedures and the fact that RQs’ influence functions are unbounded in the design space but bounded in the response variable. As a consequence of this, RQs have a high affinity for leverage points and a high exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics were also considered in order to have a more holistic picture. The investigations used comprised analytic means as well as simulation. Furthermore, applications were made to artificial computer generated data sets as well as standard data sets from the literature. These revealed that the ES based statistics can be used to address problems arising in the RQ scenario to some degree of success. However, due to the interdependence between the different aspects, viz. the one between leverage and collinearity and the one between leverage and outliers, “solutions” are often dependent on the particular situation. In spite of this complexity, the research did produce some fairly general guidelines that can be fruitfully used in practice.
- ItemClassifying yield spread movements in sparse data through triplots(Stellenbosch : Stellenbosch University, 2020-03) Van der Merwe, Carel Johannes; De Wet, Tertius; Inghelbrecht, Koen; Vanmaele, Michele; Conradie, W. J. (Willem Johannes); Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : In many developing countries, including South Africa, all data that are required to calculate the fair values of financial instruments are not always readily available. Additionally, in some instances, companies who do not have the necessary quantitative skills are reluctant to incorporate the correct fair valuation by failing to employ the appropriate techniques. This problem is most notable with regards to unlisted debt instruments. There are two main inputs with regards to the valuation of unlisted debt instruments, namely the the risk-free curve and the the yield spread. Investigation into these two components forms the basis of this thesis. Firstly, an analysis is carried out to derive approximations of risk-free curves in areas where data is sparse. Thereafter it is investigated whether there is sufficient evidence of a significant change in yield spreads of unlisted debt instruments. In order to determine these changes, however, a new method that allows for simultaneous visualisation and classification of data was developed - termed triplot classification with polybags. This new classification technique also has the ability to limit misclassification rates. In the first paper, a proxy for the extended zero curve, calculated from other observable inputs, is found through a simulation approach by incorporating two new techniques, namely permuted integer multiple linear regression and aggregate standardised model scoring. It was found that a Nelson Siegel fit, with a mixture of one year forward rates as proxies for the long term zero point, and some discarding of initial data points, performs relatively well in the training and testing data sets. This new method allows for the approximation of risk-free curves where no long term points are available, and further allows for the determinants of the yield curve shape by considering other available data. The changes in these shape determining parameters are used in the final paper as determinants for changes in yield spreads. For the second paper, a new classification technique is developed that was used in the final paper. Classification techniques do not easily allow for visual interpretation, nor do they usually allow for the limitation of the false negative and positive error rates. For some areas of research and practical applications these shortcomings are important to address. In this paper, classification techniques are combined with biplots, allowing for simultaneous visual representation and classification of the data, resulting in the so-called triplot. By further incorporating polybags, the ability of limiting misclassification type errors is also introduced. A simulation study as well as an application is provided showing that the method provides similar results compared to existing methods, but with added visualisation benefits. The paper focuses purely on developing a statistical technique that can be applied to any field. The application that is provided, for example, is on a medical data set. In the final paper the technique is applied to changes in yield spreads. The third paper considered changes in yield spreads which were analysed through various covariates to determine whether significant decreases or increases would have been observed for unlisted debt instruments. The methodology does not specifically determine the new spread, but gives evidence on whether the initial implied spread could be left the same, or whether a new spread should be determined. These yield spread movements are classified using various share, interest rate, financial ratio, and economic type covariates in a visually interpretive manner. This also allows for a better understanding of how various factors drive the changes in yield spreads. Finally, as supplement to each paper, a web-based application was built allowing the reader to interact with all the data and properties of the methodologies discussed. The following links can be used to access these three applications: - Paper 1: https://carelvdmerwe.shinyapps.io/ProxyCurve/ - Paper 2: https://carelvdmerwe.shinyapps.io/TriplotSimulation/ - Paper 3: https://carelvdmerwe.shinyapps.io/SpreadsTriplot/
- ItemComparison of methods to calculate measures of inequality based on interval data(Stellenbosch : Stellenbosch University, 2015-12) Neethling, Willem Francois; De Wet, Tertius; Neethling, Ariane; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial ScienceENGLISH ABSTRACT: In recent decades, economists and sociologists have taken an increasing interest in the study of income attainment and income inequality. Many of these studies have used census data, but social surveys have also increasingly been utilised as sources for these analyses. In these surveys, respondents’ incomes are most often not measured in true amounts, but in categories of which the last category is open-ended. The reason is that income is seen as sensitive data and/or is sometimes difficult to reveal. Continuous data divided into categories is often more difficult to work with than ungrouped data. In this study, we compare different methods to convert grouped data to data where each observation has a specific value or point. For some methods, all the observations in an interval receive the same value; an example is the midpoint method, where all the observations in an interval are assigned the midpoint. Other methods include random methods, where each observation receives a random point between the lower and upper bound of the interval. For some methods, random and non-random, a distribution is fitted to the data and a value is calculated according to the distribution. The non-random methods that we use are the midpoint-, Pareto means- and lognormal means methods; the random methods are the random midpoint-, random Pareto- and random lognormal methods. Since our focus falls on income data, which usually follows a heavy-tailed distribution, we use the Pareto and lognormal distributions in our methods. The above-mentioned methods are applied to simulated and real datasets. The raw values of these datasets are known, and are categorised into intervals. These methods are then applied to the interval data to reconvert the interval data to point data. To test the effectiveness of these methods, we calculate some measures of inequality. The measures considered are the Gini coefficient, quintile share ratio (QSR), the Theil measure and the Atkinson measure. The estimated measures of inequality, calculated from each dataset obtained through these methods, are then compared to the true measures of inequality.
- ItemConfidence intervals for estimators of welfare indices under complex sampling(Stellenbosch : University of Stellenbosch, 2010-03) Kirchoff, Retha; De Wet, Tertius; Neethling, Ariane; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: The aim of this study is to obtain estimates and confidence intervals for welfare indices under complex sampling. It begins by looking at sampling in general with specific focus on complex sampling and weighting. For the estimation of the welfare indices, two resampling techniques, viz. jackknife and bootstrap, are discussed. They are used for the estimation of bias and standard error under simple random sampling and complex sampling. Three con dence intervals are discussed, viz. standard (asymptotic), percentile and bootstrap-t. An overview of welfare indices and their estimation is given. The indices are categorized into measures of poverty and measures of inequality. Two Laeken indices, viz. at-risk-of-poverty and quintile share ratio, are included in the discussion. The study considers two poverty lines, namely an absolute poverty line based on percy (ratio of total household income to household size) and a relative poverty line based on equivalized income (ratio of total household income to equivalized household size). The data set used as surrogate population for the study is the Income and Expenditure survey 2005/2006 conducted by Statistics South Africa and details of it are provided and discussed. An analysis of simulation data from the surrogate population was carried out using techniques mentioned above and the results were graphed, tabulated and discussed. Two issues were considered, namely whether the design of the survey should be considered and whether resampling techniques provide reliable results, especially for con dence intervals. The results were a mixed bag . Overall, however, it was found that weighting showed promise in many cases, especially in the improvement of the coverage probabilities of the con dence intervals. It was also found that the bootstrap resampling technique was reliable (by looking at standard errors). Further research options are mentioned as possible solutions towards the mixed results.
- ItemExtreme quantile inference(Stellenbosch : Stellenbosch University, 2020-03) Buitendag, Sven; De Wet, Tertius; Beirlant, Jan; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : A novel approach to performing extreme quantile inference is proposed by applying ridge regression and the saddlepoint approximation to results in extreme value theory. To this end, ridge regression is applied to the log differences of the largest sample quantiles to obtain a bias-reduced estimator of the extreme value index, which is a parameter in extreme value theory that plays a central role in the estimation of extreme quantiles. The utility of the ridge regression estimators for the extreme value index is illustrated by means of simulations results and applications to daily wind speeds. A new pivotal quantity is then proposed with which a set of novel asymptotic confidence intervals for extreme quantiles are obtained. The ridge regression estimator for the extreme value index is combined with the proposed pivotal quantity together with the saddlepoint approximation to yield a set of confidence intervals that are accurate and narrow. The utility of these confidence intervals are illustrated by means of simulation results and applications to Belgian reinsurance data. Multivariate generalizations of sample quantiles are considered with the aim of developing multivariate risk measures, including maximum correlation risk measures and an estimator for the extreme value index. These multivariate sample quantiles are called center-outward quantiles, and are defined as an optimal transportation of the uniformly distributed points in the unit ball Sd to the observed sample points in Rd. A continuous extension of the centeroutward quantile is proposed, which yields quantile contours that are nested. Furthermore, maximum correlation risk measures for multivariate samples are presented, as well as an estimator for the extreme value index for multivariate regularly varying samples. These results are applied to Danish fire insurance data and the stock returns of Google and Apple share prices to illustrate their utility.
- ItemExtreme value-based novelty detection(Stellenbosch : Stellenbosch University, 2017-12) Steyn, Matthys Lucas; De Wet, Tertius; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : This dissertation investigates extreme value-based novelty detection. An in-depth review of the theoretical proofs and an analytical investigation of current novelty detection methods are given. It is concluded that the use of extreme value theory for novelty detection leads to superior results. The first part of this dissertation provides an overview of novelty detection and the various methods available to construct a novelty detection algorithm. Four broad approaches are discussed, with this dissertation focusing on probabilistic novelty detection. A summary of the applications of novelty detection and the properties of an efficient novelty detection algorithm are also provided. The theory of extremes plays a vital role in this work. Therefore, a comprehensive description of the main theorems and modelling approaches of extreme value theory is given. These results are used to construct various novelty detection algorithms based on extreme value theory. The first extreme value-based novelty detection algorithm is termed the Winner-Takes-All method. The model’s strong theoretical underpinning as well as its disadvantages are discussed. The second method reformulates extreme value theory in terms of extreme probability density. This definition is utilised to derive a closed-form expression of the probability distribution of a Gaussian probability density. It is shown that this distribution is in the minimum domain of attraction of the extremal Weibull distribution. Two other methods to perform novelty detection with extreme value theory are explored, namely the numerical approach and the approach based on modern extreme value theory. Both these methods approximate the distribution of the extreme probability density values under the assumption of a Gaussian mixture model. In turn, novelty detection can be performed in complex settings using extreme value theory. To demonstrate an application of the discussed methods a banknote authentication dataset is analysed. It is clearly shown that extreme value-based novelty detection methods are extremely efficient in detecting forged banknotes. This demonstrates the practicality of the different approaches. The concluding chapter compares the theoretical justification, predictive power and efficiency of the different approaches. Proposals for future research are also discussed.
- ItemImproved estimation procedures for a positive extreme value index(Stellenbosch : University of Stellenbosch, 2010-12) Berning, Thomas Louw; De Wet, Tertius; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: In extreme value theory (EVT) the emphasis is on extreme (very small or very large) observations. The crucial parameter when making inferences about extreme quantiles, is called the extreme value index (EVI). This thesis concentrates on only the right tail of the underlying distribution (extremely large observations), and specifically situations where the EVI is assumed to be positive. A positive EVI indicates that the underlying distribution of the data has a heavy right tail, as is the case with, for example, insurance claims data. There are numerous areas of application of EVT, since there are a vast number of situations in which one would be interested in predicting extreme events accurately. Accurate prediction requires accurate estimation of the EVI, which has received ample attention in the literature from a theoretical as well as practical point of view. Countless estimators of the EVI exist in the literature, but the practitioner has little information on how these estimators compare. An extensive simulation study was designed and conducted to compare the performance of a wide range of estimators, over a wide range of sample sizes and distributions. A new procedure for the estimation of a positive EVI was developed, based on fitting the perturbed Pareto distribution (PPD) to observations above a threshold, using Bayesian methodology. Attention was also given to the development of a threshold selection technique. One of the major contributions of this thesis is a measure which quantifies the stability (or rather instability) of estimates across a range of thresholds. This measure can be used to objectively obtain the range of thresholds over which the estimates are most stable. It is this measure which is used for the purpose of threshold selection for the proposed PPD estimator. A case study of five insurance claims data sets illustrates how data sets can be analyzed in practice. It is shown to what extent discretion can/should be applied, as well as how different estimators can be used in a complementary fashion to give more insight into the nature of the data and the extreme tail of the underlying distribution. The analysis is carried out from the point of raw data, to the construction of tables which can be used directly to gauge the risk of the insurance portfolio over a given time frame.
- ItemLevy processes and quantum mechanics : an investigation into the distribution of log returns(Stellenbosch : Stellenbosch University, 2021-03) Le Roux, Christiaan Hugo; De Wet, Tertius; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : It is well known that log returns on stocks do not follow a normal distribution as is assumed under the Black-Scholes pricing formula. This study investigates alternatives to Brownian Motion which are better suited to capture the stylized facts of asset returns. Lévy processes and models based on Quantum Mechanical theory are described and t to daily log returns for various JSE Indices. Maximum likelihood estimation is used to estimate the parameters of the Lévy processes and the Cramer-von Mises goodness of t statistic is minimized to estimate the parameters of the Quantum Mechanical models. Q-Q plots and the Kolmogorov-Smirnov t statistic is presented to assess the fit of the various models. The results show that the Lévy processes, specically the Normal Inverse Gaussian process, are the best among the processes considered. The performance of the Quantum Mechanical models could be improved if more eigenstates are considered in the approximation, however the computational expense of these models makes them impractical.
- Item'n Ondersoek na die eindige steekproefgedrag van inferensiemetodes in ekstreemwaarde-teorie(Stellenbosch : University of Stellenbosch, 2005-03) Van Deventer, Dewald; De Wet, Tertius; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.Extremes are unusual or rare events. However, when such events – for example earthquakes, tidal waves and market crashes - do take place, they typically cause enormous losses, both in terms of human lives and monetary value. For this reason, it is of critical importance to accurately model extremal events. Extreme value theory entails the development of statistical models and techniques in order to describe and model such rare observations. In this document we discuss aspects of extreme value theory. This theory consists of two approaches: The classical maxima method, based on the properties of the maximum of a sample and the more popular threshold theory, based upon the properties of exceedances of a specified threshold value. This document provides the practitioner with the theoretical and practical tools for both these approaches. This will enable him/her to perform extreme value analyses with confidence. Extreme value theory – for both approaches - is based upon asymptotic arguments. For finite samples, the limiting result for the sample maximum holds approximately only. Similarly, for finite choices of the threshold, the limiting distribution for exceedances of that threshold holds only approximately. In this document we investigate the quality of extreme value based inferences with regard to the unknown underlying distribution when the sample size or threshold is finite. Estimation of extreme tail quantiles of the underlying distribution, as well as the calculation of confidence intervals, are typically the most important objectives of an extreme analysis. For that reason, we evaluate the accuracy of extreme based inferences in terms of these estimates. This investigation was carried out using a simulation study, performed with the software package S-Plus.
- ItemThe saddle-point method and its application to the hill estimator(Stellenbosch : Stellenbosch University, 2016-12) Buitendag, Sven; De Wet, Tertius; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.ENGLISH SUMMARY : The saddle-point approximation is a highly accurate approximation of the distribution of a random variable. It was originally derived as an approximation in situations where a parameter takes on large values. However, due to its high accuracy and good behaviour in a variety of applications not involving such a parameter, it has been generalized and applied to the distribution of any random variable with a well-behaved cumulant generating function. In this thesis the theory underlying the saddle-point approximation will be discussed and illustrated with an application to approximate the distribution of the Hill estimator in extreme value theory.
- ItemSome statistical aspects of LULU smoothers(Stellenbosch : University of Stellenbosch, 2007-12) Jankowitz, Maria Dorothea; Conradie, W. J.; De Wet, Tertius; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.The smoothing of time series plays a very important role in various practical applications. Estimating the signal and removing the noise is the main goal of smoothing. Traditionally linear smoothers were used, but nonlinear smoothers became more popular through the years. From the family of nonlinear smoothers, the class of median smoothers, based on order statistics, is the most popular. A new class of nonlinear smoothers, called LULU smoothers, was developed by using the minimum and maximum selectors. These smoothers have very attractive mathematical properties. In this thesis their statistical properties are investigated and compared to that of the class of median smoothers. Smoothing, together with related concepts, are discussed in general. Thereafter, the class of median smoothers, from the literature is discussed. The class of LULU smoothers is defined, their properties are explained and new contributions are made. The compound LULU smoother is introduced and its property of variation decomposition is discussed. The probability distributions of some LULUsmoothers with independent data are derived. LULU smoothers and median smoothers are compared according to the properties of monotonicity, idempotency, co-idempotency, stability, edge preservation, output distributions and variation decomposition. A comparison is made of their respective abilities for signal recovery by means of simulations. The success of the smoothers in recovering the signal is measured by the integrated mean square error and the regression coefficient calculated from the least squares regression of the smoothed sequence on the signal. Finally, LULU smoothers are practically applied.
- ItemSouth African security market imperfections(Stellenbosch : University of Stellenbosch, 2006-03) Jooste, Dirk; De Wet, Tertius; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.In recent times many theories have surfaced posing challenging threats to the Efficient Market Hypothesis. We are entering an exciting era of financial economics fueled by the urge to have a better understanding of the intricate workings of financial markets. Many studies are emerging that investigate the relationship between stock market predictability and efficiency. This paper studies the existence of calendar-based patterns in equity returns, price momentum and earnings momentum in the South African securities market. These phenomena are commonly referred to in the literature as security market imperfections, financial market puzzles and market anomalies. We provide evidence that suggests that they do exist in the South African context, which is consistent with findings in various international markets. A vast number of papers on the subject exist in the international arena. However, very few empirical studies on the South African market can be found in the public domain. We aim to contribute to the literature by investigating the South African case.
- ItemStatistical inference for inequality measures based on semi-parametric estimators(Stellenbosch : Stellenbosch University, 2011-12) Kpanzou, Tchilabalo Abozou; De Wet, Tertius; Neethling, Ariane; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: Measures of inequality, also used as measures of concentration or diversity, are very popular in economics and especially in measuring the inequality in income or wealth within a population and between populations. However, they have applications in many other fields, e.g. in ecology, linguistics, sociology, demography, epidemiology and information science. A large number of measures have been proposed to measure inequality. Examples include the Gini index, the generalized entropy, the Atkinson and the quintile share ratio measures. Inequality measures are inherently dependent on the tails of the population (underlying distribution) and therefore their estimators are typically sensitive to data from these tails (nonrobust). For example, income distributions often exhibit a long tail to the right, leading to the frequent occurrence of large values in samples. Since the usual estimators are based on the empirical distribution function, they are usually nonrobust to such large values. Furthermore, heavy-tailed distributions often occur in real life data sets, remedial action therefore needs to be taken in such cases. The remedial action can be either a trimming of the extreme data or a modification of the (traditional) estimator to make it more robust to extreme observations. In this thesis we follow the second option, modifying the traditional empirical distribution function as estimator to make it more robust. Using results from extreme value theory, we develop more reliable distribution estimators in a semi-parametric setting. These new estimators of the distribution then form the basis for more robust estimators of the measures of inequality. These estimators are developed for the four most popular classes of measures, viz. Gini, generalized entropy, Atkinson and quintile share ratio. Properties of such estimators are studied especially via simulation. Using limiting distribution theory and the bootstrap methodology, approximate confidence intervals were derived. Through the various simulation studies, the proposed estimators are compared to the standard ones in terms of mean squared error, relative impact of contamination, confidence interval length and coverage probability. In these studies the semi-parametric methods show a clear improvement over the standard ones. The theoretical properties of the quintile share ratio have not been studied much. Consequently, we also derive its influence function as well as the limiting normal distribution of its nonparametric estimator. These results have not previously been published. In order to illustrate the methods developed, we apply them to a number of real life data sets. Using such data sets, we show how the methods can be used in practice for inference. In order to choose between the candidate parametric distributions, use is made of a measure of sample representativeness from the literature. These illustrations show that the proposed methods can be used to reach satisfactory conclusions in real life problems.
- ItemStatistical inference of the multiple regression analysis of complex survey data(Stellenbosch : Stellenbosch University, 2016-12) Luus, Retha; De Wet, Tertius; Neethling, Ariane; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.ENGLISH SUMMARY : The quality of the inferences and results put forward from any statistical analysis is directly dependent on the correct method used at the analysis stage. Most survey data analyzed in practice riginate from stratified multistage cluster samples or complex samples. In developed countries the statistical analysis, for example linear modelling, of complex sampling (CS) data, otherwise known as survey-weighted least squares (SWLS) regression, has received some attention over time. In developing countries such as South Africa and the rest of Africa, SWLS regression is often confused with weighted least squares (WLS) regression or, in some extreme cases, the CS design is ignored and an ordinary least squares (OLS) model is fitted to the data. This is in contrast to what is found in the developed countries. Furthermore, especially in the developing countries, inference concerning the linear modelling of a continuous response is not as well documented as is the case for the inference of a categorical response, specifically in terms of a dichotomous response. Hence, the decision was made to research the linear modelling of a continuous response under CS with the objective of illustrating how the results could differ if the statistician ignores the complex design of the data or naively applies WLS in comparison to the correct SWLS regression. The complex sampling design leads to observations having unequal inclusion probabilities, the inverse of which is known as the design weight of an observation. Once adjusted for unit nonresponse and differential non-response, the sampling weights can have large variability that could have an adverse effect on the estimation precision. Weight trimming is cautiously recommended as a remedy for this, but could also increase the bias of an estimator which then affects the estimation precision once more. The effect of weight trimming on estimation precision is also investigated in this research. Two important parts of regression analysis are researched here, namely the evaluation of the fitted model and the inference concerning the model parameters. The model evaluation part includes the adjustment of well-known prediction error estimation methods, viz. leave-one-out cross-validation, bootstrap estimation and .632 bootstrap estimation, for application to CS data. It also considers a number of outlier detection diagnostics such as the leverages and Cook's distance. The model parameter inference includes bootstrap variance estimation as well as the construction of bootstrap confidence intervals, viz. the percentile, bootstrap-t, and BCa confidence intervals. Two simulation studies are conducted in this thesis. For the first simulation study a model was developed and then used to simulate a hierarchical population such that stratified two-stage cluster samples can be selected from this population. The second simulation study makes use of stratified two-stage cluster samples that are sampled from real-world data, i.e. the Income and Expenditure Survey of 2005/2006 conducted by Statistics South Africa. Similar conclusions are made from both simulation studies. These conclusions include that the incorrect linear model applied to CS data could lead to wrong conclusions, that weight trimming, when conducted with care, further improves estimation precision, and that linear modelling based on resampling methods such as the bootstrap, could outperform standard linear modelling methods, especially when applied to real-world data.
- ItemTime series forecasting and model selection in singular spectrum analysis(Stellenbosch : Stellenbosch University, 2002-11) De Klerk, Jacques; De Wet, Tertius; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences.ENGLISH ABSTRACT: Singular spectrum analysis (SSA) originated in the field of Physics. The technique is non-parametric by nature and inter alia finds application in atmospheric sciences, signal processing and recently in financial markets. The technique can handle a very broad class of time series that can contain combinations of complex periodicities, polynomial or exponential trend. Forecasting techniques are reviewed in this study, and a new coordinate free joint-horizon k-period-ahead forecasting formulation is derived. The study also considers model selection in SSA, from which it become apparent that forward validation results in more stable model selection. The roots of SSA are outlined and distributional assumptions of signal senes are considered ab initio. Pitfalls that arise in the multivariate statistical theory are identified. Different approaches of recurrent one-period-ahead forecasting are then reviewed. The forecasting approaches are all supplied in algorithmic form to ensure effortless adaptation to computer programs. Theoretical considerations, underlying the forecasting algorithms, are also considered. A new coordinate free joint-horizon kperiod- ahead forecasting formulation is derived and also adapted for the multichannel SSA case. Different model selection techniques are then considered. The use of scree-diagrams, phase space portraits, percentage variation explained by eigenvectors, cross and forward validation are considered in detail. The non-parametric nature of SSA essentially results in the use of non-parametric model selection techniques. Finally, the study also considers a commercial software package that is available and compares it with Fortran code, which was developed as part of the study.
- ItemValue at risk and expected shortfall : traditional measures and extreme value theory enhancements with a South African market application(Stellenbosch : Stellenbosch University, 2013-12) Dicks, Anelda; Conradie, W. J.; De Wet, Tertius; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: Accurate estimation of Value at Risk (VaR) and Expected Shortfall (ES) is critical in the management of extreme market risks. These risks occur with small probability, but the financial impacts could be large. Traditional models to estimate VaR and ES are investigated. Following usual practice, 99% 10 day VaR and ES measures are calculated. A comprehensive theoretical background is first provided and then the models are applied to the Africa Financials Index from 29/01/1996 to 30/04/2013. The models considered include independent, identically distributed (i.i.d.) models and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) stochastic volatility models. Extreme Value Theory (EVT) models that focus especially on extreme market returns are also investigated. For this, the Peaks Over Threshold (POT) approach to EVT is followed. For the calculation of VaR, various scaling methods from one day to ten days are considered and their performance evaluated. The GARCH models fail to converge during periods of extreme returns. During these periods, EVT forecast results may be used. As a novel approach, this study considers the augmentation of the GARCH models with EVT forecasts. The two-step procedure of pre-filtering with a GARCH model and then applying EVT, as suggested by McNeil (1999), is also investigated. This study identifies some of the practical issues in model fitting. It is shown that no single forecasting model is universally optimal and the choice will depend on the nature of the data. For this data series, the best approach was to augment the GARCH stochastic volatility models with EVT forecasts during periods where the first do not converge. Model performance is judged by the actual number of VaR and ES violations compared to the expected number. The expected number is taken as the number of return observations over the entire sample period, multiplied by 0.01 for 99% VaR and ES calculations.