- Browse by Title

# Department of Statistics and Actuarial Science

## Permanent URI for this community

## Browse

### Browsing Department of Statistics and Actuarial Science by Title

Now showing 1 - 20 of 169

###### Results Per Page

###### Sort Options

- ItemAdvances in random forests with application to classification(Stellenbosch : Stellenbosch University, 2016-12) Pretorius, Arnu; Bierman, Surette; Steel, Sarel J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.
Show more ENGLISH SUMMARY : Since their introduction, random forests have successfully been employed in a vast array of application areas. Fairly recently, a number of algorithms that adhere to Leo Breiman’s definition of a random forest have been proposed in the literature. Breiman’s popular random forest algorithm (Forest-RI), and related ensemble classification algorithms which followed, form the focus of this study. A review of random forest algorithms that were developed since the introduction of Forest-RI is given. This includes a novel taxonomy of random forest classification algorithms, which is based on their sources of randomization, and on deterministic modifications. Also, a visual conceptualization of contributions to random forest algorithms in the literature is provided by means of multidimensional scaling. Towards an analysis of advances in random forest algorithms, decomposition of the expected prediction error into bias and variance components is considered. In classification, such decompositions are not as straightforward as in the case of using squared-error loss for regression. Hence various definitions of bias and variance for classification can be found in the literature. Using a particular bias-variance decomposition, an empirical study of ensemble learners, including bagging, boosting and Forest-RI, is presented. From the empirical results and insights into the way in which certain mechanisms of random forests affect bias and variance, a novel random forest framework, viz. oblique random rotation forests, is proposed. Although not entirely satisfactory, the framework serves as an example of a heuristic approach towards novel proposals based on bias-variance analyses, instead of an ad hoc approach, as is often found in the literature. The analysis of comparative studies regarding advances in random forest algorithms is also considered. It is of interest to critically evaluate the conclusions that can be drawn from these studies, and to infer whether novel random forest algorithms are found to significantly outperform Forest-RI. For this purpose, a meta-analysis is conducted in which an evaluation is given of the state of research on random forests based on all (34) papers that could be found in which a novel random forest algorithm was proposed and compared to already existing random forest algorithms. Using the reported performances in each paper, a novel two-step procedure is proposed, which allows for multiple algorithms to be compared over multiple data sets, and across different papers. The meta analysis results indicate weighted voting strategies and variable weighting in high-dimensional settings to provide significantly improved performances over the performance of Breiman’s popular Forest-RI algorithm.Show more - ItemAdvancing ESG objectives with ESG-linked derivatives(Stellenbosch : Stellenbosch University, 2024-03) Jansen van Rensburg, Pieter Willem; Mesias, Alfeus; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY: Environmental, social, and governance (ESG) concerns require the active involvement of the finan- cial sector in advancing ESG compliance. The global derivative market holds significant importance. Leveraging derivatives, exchanges, and clearinghouses offers a pathway for the financial sector to promote ESG objectives. This research assignment examines the particular challenges associated with utilizing derivatives to promote ESG compliance. It explores ESG-linked derivatives, utilizing Monte Carlo simulation, associated with Key Performance Indicators (KPIs) and priced based on the attainment of specific Sustainability Performance Targets (SPTs). These specific ESG-linked derivatives are tailored for this purpose - advancing ESG objectives. ESG-linked derivatives repre- sent an emerging field, demanding further in-depth exploration.Show more - ItemAnalysing GARCH models across different sample sizes(Stellenbosch : Stellenbosch University, 2023-03) Purchase, Michael Andrew; Conradie, Willie; Viljoen, Helena; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY: As initially constructed by Robert Engle and his student Tim Bollerslev, the GARCH model has the desired ability to model the changing variance (heteroskedasticity) of a time series. The primary goal of this study is to investigate changes in volatility, estimates of the parameters, forecasting error as well as excess kurtosis across different window lengths as this may indicate an appropriate sample size to use when fitting a GARCH model to a set of data. After examining the T = 6489 1-day logreturns on the FTSE/JSE-ALSI between 27 December 1995 and 15 December 2021, it was calculated that an average estimate for volatility of 0.193 670 should be expected. Given that a rolling window methodology was applied across 20 different window lengths under both the S-GARCH(1,1) and E-GARCH(1,1) models, a total of 180 000 GARCH models were fit with parameter and volatility estimates, information criteria and volatility forecasts being extracted. Given the construction of the asymmetric response function under the E-GARCH model, this model has greater ability to account for the `leverage effect' where negative market returns are greater drivers of higher volatility than positive returns of an equal magnitude. Among others, key results include volatility estimates across most window lengths taking longer to settle after the Global Financial Crisis (GFC) than after the COVID-19 pandemic. This was interesting because volatility reached higher levels during the latter, indicating that the South African market reacted more severely to the COVID-19 pandemic but also managed to adjust to new market conditions quicker than those after the Global Financial Crisis. In terms of parameter estimates under the S-GARCH(1,1) model, values for a and b under a window length of 100 trading days were often calculated infinitely close to zero and one respectively, indicating a strong possibility of the optimising algorithm arriving at local maxima of the likelihood function. With the exceptionally low p-values under the Jarque-Bera and Kolmogorov-Smirnov tests as well as all excess kurtosis values being greater than zero, substantial motivation was provided for the use of the Student's t-distribution when fitting GARCH models. Given the various results obtained around volatility, parameter estimates, RMSE and information criteria, it was concluded that a window length of 600 is perhaps the most appropriate when modelling GARCH volatility.Show more - ItemAn analysis of income and poverty in South Africa(Stellenbosch : University of Stellenbosch, 2007-03) Malherbe, Jeanine Elizabeth; De Wet, Tertius; Viljoen, H.; Neethling, Ariane; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more The aim of this study is to assess the welfare of South Africa in terms of poverty and inequality. This is done using the Income and Expenditure Survey (IES) of 2000, released by Statistics South Africa, and reviewing the distribution of income in the country. A brief literature review of similar studies is given along with a broad de nition of poverty and inequality. A detailed description of the dataset used is given together with aspects of concern surrounding the dataset. An analysis of poverty and income inequality is made using datasets containing the continuous income variable, as well as a created grouped income variable. Results from these datasets are compared and conclusions made on the use of continuous or grouped income variables. Covariate analysis is also applied in the form of biplots. A brief overview of biplots is given and it is then used to obtain a graphical description of the data and identify any patterns. Lastly, the conclusions made in this study are put forward and some future research is mentioned.Show more - ItemAnalysis to indicate the impact Hindsight Bias have on the outcome when forecasting of stock in the South African equity market(Stellenbosch : Stellenbosch University, 2023-12) Heyneke, Anton Lafrass; Conradie, Willie; Alfeus, Mesias; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY: A novel Artificial Neural Network (ANN) framework presented in this study has the ability to mimic the effect that cognitive biases, specifically hindsight bias has on the financial market. This study investigates how hindsight bias influences models and their outcomes. During this study the hindsight bias effect will be measured within a South African context. The decisions that people make when faced with uncertainty are characterized by heuristic judgments and cognitive biases. If these characteristics are systematic and confirmed through research and literature related to this topic, it would form a quintessential part to the explanation of the behaviour of financial markets. This research presents a methodology that could be used to model the impact of cognitive biases on the financial markets. In this study, an ANN will be used as a stand-in for the decision-making process of an investor. It is important to note that the selection of the companies, on which the ANN will be trained, validated and tested, demonstrated cognitive bias during the study's preparation. Though there are many cognitive biases that have been identified in the literature on behavioural finance, this study will concentrate solely on the impact of hindsight bias. On financial markets, hindsight bias manifests when outcomes seem more predictable after they have already happened. This study attempts and succeeds – to some degree - to replicate the return characteristics of the ten chosen companies for the assessment period from 2010 to 2021. The study described here may still be subject to various cognitive biases and systemic behavioural errors in addition to the hindsight bias. The further application of this technique will stimulate further research with respect to the influence of investor behaviour on financial markets.Show more - ItemThe application and testing of smart beta strategies in the South African market(Stellenbosch : Stellenbosch University, 2018-03) Viljoen, Jacobus; Conradie, W. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY: Smart Beta portfolios have recently prompted great interest both from academic researchers and market practitioners. Investors are attracted by the performances produced by these portfolios compared to the traditional market capitalisation weighted indices. The question that this thesis attempts to answer is: Do smart beta portfolios outperform the traditional cap-weighted indices in the South African market? According to BlackRock’s smart beta guide (Ang, 2015), the smart beta strategies aim to capture stock return drivers through rules-based, transparent strategies. They are generally long only and usually implemented within an asset class, in the case of this assignment, only equity. Smart beta is thus an investment strategy that positions itself between active and passive investing. Smart beta strategies are active in the sense that they invest in factors that drive return to improve risk-adjusted returns. In the same way, these strategies are closely related to passive strategies in that they are transparent, systematic and rules based. In this assignent five different fundamental factor portfolios (value, quality, momentum, volatility and a combination of the four, called multi-factor) were created based on the smart beta methodology. The factors that were used are well researched in the market and have been proven to provide investors with excess return over the market. Firstly, stock selection was done using two different techniques (time series comparison and cross-sectional comparison). The best stocks were selected based on their fundamental factor characteristics. Secondly, two different smart beta weighting strategies as well as a market-cap weighting strategy were applied to the selected stocks in order to create the various portfolios. The risk and return characteristics of the created portfolios were compared to those of the two benchmarks (JSE All Share Index and the JSE Shareholder Weighted All Share Index). The smart beta portfolios created in this thesis outperformed the benchmarks as well as the market-cap weighted portfolios. Lastly, the estimation of the macroeconomic exposure of the smart beta portfolios using a methodology outlined in a Citi Research paper is presented (Montagu, Krause, Burgess, Jalan, Murray, Chew and Yusuf., 2015).Show more - ItemApplication of cluster analysis and multidimensional scaling on medical schemes data(Stellenbosch : Stellenbosch University, 2008-12) Roux, Ian; Le Roux, N. J.; McLeod, H.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more Cluster analysis and multidimensional scaling (MDS) methods can be used to explore the structure in multidimensional data and can be applied to various fields of study. In this study, clustering techniques and MDS methods are applied to a data set from the health insurance field. This data set contains information of the number of medical scheme beneficiaries, between ages 55 to 59, that are treated for certain combinations of chronic diseases. Clustering techniques and MDS methods will be used to describe the interrelations among these chronic diseases and to determine certain clusters of chronic diseases. Similarity or dissimilarity measures between the chronic diseases are constructed before the application of MDS methods or clustering techniques, because the chronic diseases are binary variables in the data set. The calculation of dissimilarities between the chronic diseases is based on various dissimilarity coefficients, where a different dissimilarity coefficient will produce a different set of dissimilarities. One of the aims of this study is to compare different dissimilarity coefficients and it will be shown that the Jaccard, Ochiai, Baroni-Urbani-Buser, Phi and Yule dissimilarity coefficients are most suitable for use on this particular data set. MDS methods are used to produce a lower dimensional display space where the chronic diseases are represented by points and distances between these points give some measurement of similarity between the chronic diseases. The classical scaling, metric least squares scaling and nonmetric MDS methods are used in this study and it will be shown that the nonmetric MDS method is the most suitable MDS method to use for this particular data set. The Scaling by Majorizing a Complicated Function (SMACOF) algorithm is used to minimise the loss functions in this study and it was found to perform well. Clustering techniques are used to provide information about the clustering structure of the chronic diseases. Chronic diseases that are in the same cluster can be considered to be more similar, while chronic diseases in different clusters are more dissimilar. The robust clustering techniques: PAM, FANNY, AGNES and DIANA are applied to the data set. It was found that AGNES and DIANA performed very well on the data set, while PAM and FANNY performed only marginally well.Show more - ItemAn application of copulas to improve PCA biplots for multivariate extremes(Stellenbosch : Stellenbosch University, 2018-12) Perrang, Justin; Van der Merwe, Carel Johannes; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY : Principal Component Analysis (PCA) biplots is a valuable means of visualising high dimensional data. The application of PCA biplots over a wide variety of research areas containing multivariate data is well documented. However, the application of biplots to financial data is limited. This is partly due to PCA being an inadequate means of dimension reduction for multivariate data that is subject to extremes. This implies that its application to financial data is greatly diminished since extreme observations are common in financial data. Hence, the purpose of this research is to develop a method to accommodate PCA biplots for multivariate data containing extreme observations. This is achieved by fitting an elliptical copula to the data and deriving a correlation matrix from the copula parameters. The copula parameters are estimated from only extreme observations and as such the derived correlationmatrices contain the dependencies of extreme observations. Finally, applying PCA to such an “extremal” correlation matrix more efficiently preserves the relationships underlying the extremes and a more refined PCA biplot can be constructed.Show more - ItemAn application of geometric data analysis techniques to South African crime data(Stellenbosch : Stellenbosch University, 2016-12) Gurr, Benjamin William; Le Roux, Niel J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.
Show more ENGLISH SUMMARY : Due to the high levels of violent crime in South Africa, improved methods of analysis are required in order to better scrutinize these statistics. This study diverges from traditional multivariate data analysis, and provides alternative methods for analyzing crime data in South Africa. This study explores the applications of several types of geometric data analysis (GDA) methods to the study of crime in South Africa, these include: correspondence analysis, the correspondence analysis biplot, and the log-ratio biplot. Chapter 1 discusses the importance of data visualization in modern day statistics, as well as the geometric data analysis and its role as a multivariate analytical tool. Chapter 2 provides the motivation for the choice of subject matter to be explored in this study. As South Africa is recognized as having the eighth highest homicide rate in the world, along with a generally high level of violent crime, the analysis is conducted on reported violent crime statistics in South Africa. Additionally, the possible data collection challenges are also discussed in Chapter 2. The study is conducted on the violent crime statistics in South Africa for the 2004-2013 reporting period, the structure and details of which are discussed in Chapter 3. In order for this study to be comparable, it is imperative that the definitions of all crimes included are well defined. Chapter 3 places a large emphasis on declaring the exact definition of the various crimes which are utilized in this study, as recorded by the South African Police Services. The more common approaches to graphically representing crime data in South Africa are explored in Chapter 4. Chapter 4 also marks the beginning of the analysis of the South African crime data for the 2004-2013 reporting period. Univariate graphical techniques are used to analyze the data (line graphs and bar plots) for the 2004-2013 time period. However, as it is to be expected, they are hampered by serious limitations. In an attempt to improve on the analysis, focus is shifted to geometric data analysis techniques. The general methodologies to correspondence analysis, biplots, and correspondence analysis biplots are discussed in Chapter 5. Both the algorithms and the construction of the associated figures are discussed for the aforementioned methods. The application of these methodologies are implemented in Chapter 6. The results of Chapter 6 suggest some improvement upon the results of Chapter 4. These techniques provided a geometric setting where both the crimes and provinces could be represented in a single diagram, and where the relationships between both sets of variables could be analyzed. The correspondence analysis biplot proved to have some advantages in comparison to the correspondence analysis maps, as it can display numerous metrics, provide multiple calibrated axes, and allows for greater manipulation of the figure itself. Chapter 7 introduced the concept of compositional data and the log-ratio biplot. The log-ratio biplot combined the functionality of the biplot, along with a comparability measure in terms of a ratio. The log-ratio biplot proved useful in the analysis of the South African crime data as it expressed differences on a ratio scale as multiplicative differences. Additionally, log-ratio analysis has the property of being sub-compositionally coherent. Chapter 8 provides the summary and conclusions of this study. It was found that Gauteng categorically has the largest number of reported violent crimes over the reported period (2004-2013). However, the Western Cape proved to have the highest violent crime rates per capita of all the South African provinces. It was noted that over the past decade South Africa has experienced a downward trend in the number of reported murders. However, there has been a spike in the number of reported cases of murder in more recent year. This is spike is mostly driven by the large increases in reported murder cases in the Western Cape, Gauteng and KwaZulu-Natal. The most notable trend seen in the South African crime data is the rapid increase in the number of reported cases of drug-related crimes over the reported period across all provinces, but more noticeably in the Western Cape and Gauteng. On a whole, a majority of the South African provinces share similar violent crime profiles, however, Gauteng and the Western Cape deviate away from other provinces. This is due to Gauteng’s high association to robbery with aggravating circumstances and the Western Cape’s high association to drug-related crime. This study presents some evidence that the use of geometric data analysis techniques provides an improvement upon traditional reporting methods for the South African crime data. Geometric data analysis and its related methods should thus form an integral part of any study conducted into the topic at hand.Show more - ItemApplication of statistics and machine learning in healthcare(Stellenbosch : Stellenbosch University, 2019-04) Van der Merwe, Schalk Gerhardus; Muller, Chris; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY : Clinical performance and cost efficiency are key focus areas in the healthcare industry, since providing quality and affordable healthcare is a continuing challenge. The goal of this research is to use statistical analyses and modelling to improve efficiency in healthcare by focussing on readmissions. Patients readmitted to hospital can indicate poor clinical care and have immense cost implications. It is advantageous if readmissions can be kept to a minimum. Generally, stakeholders view strategies to address the clinical performance of healthcare providers, such as readmission rate, as mainly clinical in nature. However, this study will investigate the potential role of machine learning in the improvement of clinical outcomes. This study defines machine learning as the identification of complex patterns (linear or non – linear) present in observed data, with the goal of predicting a certain outcome for new cases by mimicking the true underlying pattern in the population which led to the observed outcomes in the sample while throughout limiting rigid structural assumptions. The question at hand is whether patients that are at risk of readmission can be identified, along with the risk factors that can be associated with an increase in the likelihood of the event of readmission occurring. If yes, this can provide an opportunity to reduce the number of readmissions and thus avoid the resulting cost and clinical consequences. Once identified as a patient at risk for readmission, it will provide an opportunity for early clinical intervention. In addition, the model will provide the opportunity to calculate risk scores for patients, which in turn will enable risk adjustment of the readmissions rates reported. The data under consideration in this study is healthcare data generated by the operations of an international healthcare provider, Mediclinic International. The data that the research is based on is patient data captured on hospital level in all Mediclinic hospitals, operational in Mediclinic International’s Southern African platform. Several statistical algorithms exist to model the responses of interest. The techniques consist of simple, well known techniques, as well as techniques that are more advanced. Logistic regression and decision trees are examples of simple techniques, while neural networks and support vector machines (SVM) are more complex. SAS Enterprise Guide is the software of choice for the data preparation, while SAS Enterprise Miner is the software used for the machine learning component of this study. The study aims to provide insight into machine learning techniques, as well as construct machine learning models that produce reasonable accuracy in terms of prediction of readmissions.Show more - ItemApplication of the moving block bootstrap method to resampled efficiency : the impact of the choice of block size(Stellenbosch : Stellenbosch University, 2021-12) Retief, Jan; Conradie, W. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY : Modern Portfolio theory was first developed in the 1950’s and revolutionised the way in which financial information is used to construct portfolios. Unfortunately, the theory is limited by the sensitivity of the constructed portfolio’s weights to uncertainty in the constituent’s risk and return estimates. Various advancements to the classical theory have been proposed to address this problem. One of these methods is called Resampled Efficiency (RE), which addresses the sensitivity problem by sampling expected return and risk estimates for each security included in the portfolio. Multiple portfolios are then built based on the sampled returns to construct a single averaged portfolio. The result is more robust portfolio’s that have been proven to have better out of sample performance. There are two methods available for sampling the security expected returns and risk: (1) generating random security returns (via Monte Carlo methods) or (2) using bootstrapping techniques based on observed security returns. For the second method, the moving block bootstrap (MBB) method can be used to construct bootstrapped samples for a non-stationary series of security returns. The MBB method works by ordering the historical series of observed returns into a pre-defined number of blocks (block sizes). As such, the choice of block size can have a significant effect on the sample that is obtained and used for portfolio construction. The goal of this study was to fully investigate what impact the choice of block size can have on the out of sample performance of resampled efficiency portfolios. After a literature review that assessed modern portfolio theory, resampled efficiency and the moving block bootstrap method, RE portfolios were hypothetically built based on actual security return observations. The constituents from the FTSE/JSE Top 40 index was used to construct RE portfolios for different choices of block sizes for the period between 2016 and 2017. The results indicate that the block size used can have a significant impact on the out of sample performance of the constructed portfolios, however no single block size or range of block sizes could be found that consistently result in the best performing RE portfolios. For different periods, and different levels of risk, the ideal block size differs.Show more - ItemThe appropriateness of ISDA SIMM for delta risk initial margin calculations in the South African over-the-counter interest rate swap market(Stellenbosch : Stellenbosch University, 2020-12) Cronje, Robert; Van der Merwe, Carel Johannes; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY : This research assignment assesses the appropriateness of the calibrations in the ISDA SIMM for calculating delta risk initial margin (IM) in the current over-the-counter interest rate swap market in South Africa. Three main experiments are conducted that include novel ways of delineating and uncovering potential risks in the ISDA SIMM. By comparing the delta risk IM obtained using the standard model and that of a filtered historical simulation expected shortfall model that is calibrated to the South African swaps index curve, the IM appropriateness can be inspected for various profiles based on their relative sensitivities to the tenors of the swap curve. The experiments show that the ISDA SIMM is appropriate in most cases, but due to its broad calibrations, some shortfalls are shown to exist. The results are standardised throughout and are independent of absolute size, as liquidity and concentration features are deliberately excluded. This makes the results more generally applicable and also makes all the results obtained in the analyses comparable. The framework developed here can be replicated by practitioners using their own systems in order to obtain results that meet their internal calibrations as well as their specific risk and return requirements.Show more - ItemAspects of copulas and goodness-of-fit(Stellenbosch : Stellenbosch University, 2008-12) Kpanzou, Tchilabalo Abozou; De Wet, Tertius; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more The goodness-of- t of a statistical model describes how well it ts a set of observations. Measures of goodness-of- t typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, for example to test for normality, to test whether two samples are drawn from identical distributions, or whether outcome frequencies follow a speci ed distribution. Goodness-of- t for copulas is a special case of the more general problem of testing multivariate models, but is complicated due to the di culty of specifying marginal distributions. In this thesis, the goodness-of- t test statistics for general distributions and the tests for copulas are investigated, but prior to that an understanding of copulas and their properties is developed. In fact copulas are useful tools for understanding relationships among multivariate variables, and are important tools for describing the dependence structure between random variables. Several univariate, bivariate and multivariate test statistics are investigated, the emphasis being on tests for normality. Among goodness-of- t tests for copulas, tests based on the probability integral transform, Rosenblatt's transformation, as well as some dimension reduction techniques are considered. Bootstrap procedures are also described. Simulation studies are conducted to rst compare the power of rejection of the null hypothesis of the Clayton copula by four di erent test statistics under the alternative of the Gumbel-Hougaard copula, and also to compare the power of rejection of the null hypothesis of the Gumbel-Hougaard copula under the alternative of the Clayton copula. An application of the described techniques is made to a practical data set.Show more - ItemAspects of model development using regression quantiles and elemental regressions(Stellenbosch : Stellenbosch University, 2007-03) Ranganai, Edmore; De Wet, Tertius; Van Vuuren, J.O.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from the classical Gaussian assumptions (outliers) as well as data aberrations in the design space. The two major data aberrations in the design space are collinearity and high leverage. Leverage points can also induce or hide collinearity in the design space. Such leverage points are referred to as collinearity influential points. As a consequence, over the years, many diagnostic tools to detect these anomalies as well as alternative procedures to counter them were developed. To counter deviations from the classical Gaussian assumptions many robust procedures have been proposed. One such class of procedures is the Koenker and Bassett (1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the linear model. RQs can be found as solutions to linear programming problems (LPs). The basic optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES) regressions, which consist of subsets of minimum size to estimate the necessary parameters of the model. On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown that many OLS statistics (estimators) are related to ES regression statistics (estimators). Therefore there is an inherent relationship amongst the three sets of procedures. The relationship between the ES procedure and the RQ one, has been noted almost “casually” in the literature while the latter has been fairly widely explored. Using these existing relationships between the ES procedure and the OLS one as well as new ones, collinearity, leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure was proposed as variable selection technique in the RQ scenario and some tentative results were given for it. These results are promising. Single case diagnostics were considered as well as their relationships to multiple case ones. In particular, multiple cases of the minimum size to estimate the necessary parameters of the model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were developed for both ESs and RQs. The main problems that affect RQs adversely are collinearity and leverage due to the nature of the computational procedures and the fact that RQs’ influence functions are unbounded in the design space but bounded in the response variable. As a consequence of this, RQs have a high affinity for leverage points and a high exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics were also considered in order to have a more holistic picture. The investigations used comprised analytic means as well as simulation. Furthermore, applications were made to artificial computer generated data sets as well as standard data sets from the literature. These revealed that the ES based statistics can be used to address problems arising in the RQ scenario to some degree of success. However, due to the interdependence between the different aspects, viz. the one between leverage and collinearity and the one between leverage and outliers, “solutions” are often dependent on the particular situation. In spite of this complexity, the research did produce some fairly general guidelines that can be fruitfully used in practice.Show more - ItemAspects of multi-class nearest hypersphere classification(Stellenbosch : Stellenbosch University, 2017-12) Coetzer, Frances; Lamont, Morné Michael Connell; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY : Using hyperspheres in the analysis of multivariate data is not a common practice in Statistics. However, hyperspheres have some interesting properties which are useful for data analysis in the following areas: domain description (finding a support region), detecting outliers (novelty detection) and the classification of objects into known classes. This thesis demonstrates how a hypersphere is fitted around a single dataset to obtain a support region and an outlier detector. The all-enclosing and 𝜐-soft hyperspheres are derived. The hyperspheres are then extended to multi-class classification, which is called nearest hypersphere classification (NHC). Different aspects of multi-class NHC are investigated. To study the classification performance of NHC we compared it to three other classification techniques. These techniques are support vector machine classification, random forests and penalised linear discriminant analysis. Using NHC requires choosing a kernel function and in this thesis, the Gaussian kernel will be used. NHC also depends on selecting an appropriate kernel hyper-parameter 𝛾 and a tuning parameter 𝐶. The behaviour of the error rate and the fraction of support vectors for different values of 𝛾 and 𝐶 will be investigated. Two methods will be investigated to obtain the optimal 𝛾 value for NHC. The first method uses a differential evolution procedure to find this value. The R function DEoptim() is used to execute this. The second method uses the R function sigest(). The first method is dependent on the classification technique and the second method is executed independently of the classification technique.Show more - ItemAspects of some exotic options(Stellenbosch : University of Stellenbosch, 2007-12) Theron, Nadia; Conradie, W. J.; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more The use of options on various stock markets over the world has introduced a unique opportunity for investors to hedge, speculate, create synthetic financial instruments and reduce funding and other costs in their trading strategies. The power of options lies in their versatility. They enable an investor to adapt or adjust her position according to any situation that arises. Another benefit of using options is that they provide leverage. Since options cost less than stock, they provide a high-leverage approach to trading that can significantly limit the overall risk of a trade, or provide additional income. This versatility and leverage, however, come at a price. Options are complex securities and can be extremely risky. In this document several aspects of trading and valuing some exotic options are investigated. The aim is to give insight into their uses and the risks involved in their trading. Two volatility-dependent derivatives, namely compound and chooser options; two path-dependent derivatives, namely barrier and Asian options; and lastly binary options, are discussed in detail. The purpose of this study is to provide a reference that contains both the mathematical derivations and detail in valuating these exotic options, as well as an overview of their applicability and use for students and other interested parties.Show more - ItemAssessing the influence of observations on the generalization performance of the kernel Fisher discriminant classifier(Stellenbosch : Stellenbosch University, 2008-12) Lamont, Morné Michael Connell; Louw, Nelmarie; Steel, Sarel; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more Kernel Fisher discriminant analysis (KFDA) is a kernel-based technique that can be used to classify observations of unknown origin into predefined groups. Basically, KFDA can be viewed as a non-linear extension of Fisher’s linear discriminant analysis (FLDA). In this thesis we give a detailed explanation how FLDA is generalized to obtain KFDA. We also discuss two methods that are related to KFDA. Our focus is on binary classification. The influence of atypical cases in discriminant analysis has been investigated by many researchers. In this thesis we investigate the influence of atypical cases on certain aspects of KFDA. One important aspect of interest is the generalization performance of the KFD classifier. Several other aspects are also investigated with the aim of developing criteria that can be used to identify cases that are detrimental to the KFD generalization performance. The investigation is done via a Monte Carlo simulation study. The output of KFDA can also be used to obtain the posterior probabilities of belonging to the two classes. In this thesis we discuss two approaches to estimate posterior probabilities in KFDA. Two new KFD classifiers are also derived which use these probabilities to classify observations, and their performance is compared to that of the original KFD classifier. The main objective of this thesis is to develop criteria which can be used to identify cases that are detrimental to the KFD generalization performance. Nine such criteria are proposed and their merit investigated in a Monte Carlo simulation study as well as on real-world data sets. Evaluating the criteria on a leave-one-out basis poses a computational challenge, especially for large data sets. In this thesis we also propose using the smallest enclosing hypersphere as a filter, to reduce the amount of computations. The effectiveness of the filter is tested in a Monte Carlo simulation study as well as on real-world data sets.Show more - ItemBasic in-mouth attribute evaluation : a comparison of two panels(MDPI, 2018-12-21) Mihnea, Mihaela; Aleixandre-Tudo, Jose Luis; Kidd, Martin; Du Toit, Wessel
Show more Astringency is often difficult to evaluate accurately in wine because of its complexity. This accuracy can improve through training sessions, but it can be time-consuming and expensive. A way to reduce these costs can be the use of wine experts, who are known to be reliable evaluators. Therefore, the aim of this work was to compare the sensory results and the panel performance obtained using trained panelists versus wine experts (winemakers). Judges evaluated twelve red wines for in-mouth basic perception (sweet, sour, bitter, astringent, and burning sensation) following the same tasting protocol and with the samples being presented in two different tasting modalities. Panels’ performance and relationship between the chemical composition and the sensory perception were investigated. Both panels showed similar consistency and repeatability, and they were able to accurately measure the astringency of the wines. However, the significant correlations between sensory scores and chemical composition varied with the panel and the tasting modality. From our results, we could see that winemakers tended to discriminate better between the samples when the differences were very small.Show more - ItemBayesian approaches of Markov models embedded in unbalanced panel data(Stellenbosch : Stellenbosch University, 2012-12) Muller, Christoffel Joseph Brand; Mostert, Paul J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH ABSTRACT: Multi-state models are used in this dissertation to model panel data, also known as longitudinal or cross-sectional time-series data. These are data sets which include units that are observed across two or more points in time. These models have been used extensively in medical studies where the disease states of patients are recorded over time. A theoretical overview of the current multi-state Markov models when applied to panel data is presented and based on this theory, a simulation procedure is developed to generate panel data sets for given Markov models. Through the use of this procedure a simulation study is undertaken to investigate the properties of the standard likelihood approach when fitting Markov models and then to assess its shortcomings. One of the main shortcomings highlighted by the simulation study, is the unstable estimates obtained by the standard likelihood models, especially when fitted to small data sets. A Bayesian approach is introduced to develop multi-state models that can overcome these unstable estimates by incorporating prior knowledge into the modelling process. Two Bayesian techniques are developed and presented, and their properties are assessed through the use of extensive simulation studies. Firstly, Bayesian multi-state models are developed by specifying prior distributions for the transition rates, constructing a likelihood using standard Markov theory and then obtaining the posterior distributions of the transition rates. A selected few priors are used in these models. Secondly, Bayesian multi-state imputation techniques are presented that make use of suitable prior information to impute missing observations in the panel data sets. Once imputed, standard likelihood-based Markov models are fitted to the imputed data sets to estimate the transition rates. Two different Bayesian imputation techniques are presented. The first approach makes use of the Dirichlet distribution and imputes the unknown states at all time points with missing observations. The second approach uses a Dirichlet process to estimate the time at which a transition occurred between two known observations and then a state is imputed at that estimated transition time. The simulation studies show that these Bayesian methods resulted in more stable results, even when small samples are available.Show more - ItemA Bayesian extreme value approach to the optimal reinsurance problem in a multivariate risk setting(Stellenbosch : Stellenbosch University, 2023-12) Steenkamp, Shaun Francois; Harvey, Justin; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Show more ENGLISH SUMMARY: This thesis investigates a Bayesian extreme value theory approach to analyse the optimal reinsurance problem, more specifically the optimal layer selection of an excess of loss reinsurance contract. This thesis suggests a simulation approach to the optimization of the layer selection. This thesis proposes a multivariate excess of loss (XL) reinsurance structure, referred to as the simultaneous XL reinsurance structure and applies the developed optimization algorithm to this structure in several numerical examples. The approach takes a particular focus on extreme risks, thereby investigating the optimal reinsurance contract that best protects the insurance company from rare large claims. The methodology is explained for a univariate risk case, thereafter the model is extended to the bivariate and the multivariate risk cases. The optimal reinsurance agreement can be investigated using a variety of different models. This thesis develops a risk measure minimization model, with a focus on the conditional tail expectation (CTE) riskmeasure. The model allows for the insurance company’s reinsurance budget as a constraint in the optimization problem. Bayesian techniques are especially useful in problems where data is sparse, therefore this thesis suggests utilizing a Bayesian approach to the optimal reinsurance problem where rare large claims are considered. A Bayesian extreme value theory approach could improve the process of investigating the optimal reinsurance problem by utilising Markov Chain Monte Carlo (MCMC) methods to supplement the information from the data that the insurance company has available. The approach is extended into the bivariate and multivariate risk cases where a fictitious insurer, involved in various lines of business is considered. The dependence structure is modelled using a copula approach. Numerical examples are examined, and the results are interpreted. This thesis takes a focus on the tail of the data, thereby evaluating the optimal excess of loss reinsurance contract for very large claims with very small probabilities. The research suggests an algorithm for evaluating the optimal reinsurance strategy in a multivariate risk environment for insurance companies involved in different lines of business. The analysis will improve understanding and assist decision making on the reinsurance strategy from the insurer’s perspective.Show more