Masters Degrees (Statistics and Actuarial Science)
Permanent URI for this collection
Browse
Browsing Masters Degrees (Statistics and Actuarial Science) by Title
Now showing 1 - 20 of 101
Results Per Page
Sort Options
- ItemAdvances in random forests with application to classification(Stellenbosch : Stellenbosch University, 2016-12) Pretorius, Arnu; Bierman, Surette; Steel, Sarel J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.ENGLISH SUMMARY : Since their introduction, random forests have successfully been employed in a vast array of application areas. Fairly recently, a number of algorithms that adhere to Leo Breiman’s definition of a random forest have been proposed in the literature. Breiman’s popular random forest algorithm (Forest-RI), and related ensemble classification algorithms which followed, form the focus of this study. A review of random forest algorithms that were developed since the introduction of Forest-RI is given. This includes a novel taxonomy of random forest classification algorithms, which is based on their sources of randomization, and on deterministic modifications. Also, a visual conceptualization of contributions to random forest algorithms in the literature is provided by means of multidimensional scaling. Towards an analysis of advances in random forest algorithms, decomposition of the expected prediction error into bias and variance components is considered. In classification, such decompositions are not as straightforward as in the case of using squared-error loss for regression. Hence various definitions of bias and variance for classification can be found in the literature. Using a particular bias-variance decomposition, an empirical study of ensemble learners, including bagging, boosting and Forest-RI, is presented. From the empirical results and insights into the way in which certain mechanisms of random forests affect bias and variance, a novel random forest framework, viz. oblique random rotation forests, is proposed. Although not entirely satisfactory, the framework serves as an example of a heuristic approach towards novel proposals based on bias-variance analyses, instead of an ad hoc approach, as is often found in the literature. The analysis of comparative studies regarding advances in random forest algorithms is also considered. It is of interest to critically evaluate the conclusions that can be drawn from these studies, and to infer whether novel random forest algorithms are found to significantly outperform Forest-RI. For this purpose, a meta-analysis is conducted in which an evaluation is given of the state of research on random forests based on all (34) papers that could be found in which a novel random forest algorithm was proposed and compared to already existing random forest algorithms. Using the reported performances in each paper, a novel two-step procedure is proposed, which allows for multiple algorithms to be compared over multiple data sets, and across different papers. The meta analysis results indicate weighted voting strategies and variable weighting in high-dimensional settings to provide significantly improved performances over the performance of Breiman’s popular Forest-RI algorithm.
- ItemAdvancing ESG objectives with ESG-linked derivatives(Stellenbosch : Stellenbosch University, 2024-03) Jansen van Rensburg, Pieter Willem; Mesias, Alfeus; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: Environmental, social, and governance (ESG) concerns require the active involvement of the finan- cial sector in advancing ESG compliance. The global derivative market holds significant importance. Leveraging derivatives, exchanges, and clearinghouses offers a pathway for the financial sector to promote ESG objectives. This research assignment examines the particular challenges associated with utilizing derivatives to promote ESG compliance. It explores ESG-linked derivatives, utilizing Monte Carlo simulation, associated with Key Performance Indicators (KPIs) and priced based on the attainment of specific Sustainability Performance Targets (SPTs). These specific ESG-linked derivatives are tailored for this purpose - advancing ESG objectives. ESG-linked derivatives repre- sent an emerging field, demanding further in-depth exploration.
- ItemAnalysing GARCH models across different sample sizes(Stellenbosch : Stellenbosch University, 2023-03) Purchase, Michael Andrew; Conradie, Willie; Viljoen, Helena; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: As initially constructed by Robert Engle and his student Tim Bollerslev, the GARCH model has the desired ability to model the changing variance (heteroskedasticity) of a time series. The primary goal of this study is to investigate changes in volatility, estimates of the parameters, forecasting error as well as excess kurtosis across different window lengths as this may indicate an appropriate sample size to use when fitting a GARCH model to a set of data. After examining the T = 6489 1-day logreturns on the FTSE/JSE-ALSI between 27 December 1995 and 15 December 2021, it was calculated that an average estimate for volatility of 0.193 670 should be expected. Given that a rolling window methodology was applied across 20 different window lengths under both the S-GARCH(1,1) and E-GARCH(1,1) models, a total of 180 000 GARCH models were fit with parameter and volatility estimates, information criteria and volatility forecasts being extracted. Given the construction of the asymmetric response function under the E-GARCH model, this model has greater ability to account for the `leverage effect' where negative market returns are greater drivers of higher volatility than positive returns of an equal magnitude. Among others, key results include volatility estimates across most window lengths taking longer to settle after the Global Financial Crisis (GFC) than after the COVID-19 pandemic. This was interesting because volatility reached higher levels during the latter, indicating that the South African market reacted more severely to the COVID-19 pandemic but also managed to adjust to new market conditions quicker than those after the Global Financial Crisis. In terms of parameter estimates under the S-GARCH(1,1) model, values for a and b under a window length of 100 trading days were often calculated infinitely close to zero and one respectively, indicating a strong possibility of the optimising algorithm arriving at local maxima of the likelihood function. With the exceptionally low p-values under the Jarque-Bera and Kolmogorov-Smirnov tests as well as all excess kurtosis values being greater than zero, substantial motivation was provided for the use of the Student's t-distribution when fitting GARCH models. Given the various results obtained around volatility, parameter estimates, RMSE and information criteria, it was concluded that a window length of 600 is perhaps the most appropriate when modelling GARCH volatility.
- ItemAn analysis of income and poverty in South Africa(Stellenbosch : University of Stellenbosch, 2007-03) Malherbe, Jeanine Elizabeth; De Wet, Tertius; Viljoen, H.; Neethling, Ariane; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.The aim of this study is to assess the welfare of South Africa in terms of poverty and inequality. This is done using the Income and Expenditure Survey (IES) of 2000, released by Statistics South Africa, and reviewing the distribution of income in the country. A brief literature review of similar studies is given along with a broad de nition of poverty and inequality. A detailed description of the dataset used is given together with aspects of concern surrounding the dataset. An analysis of poverty and income inequality is made using datasets containing the continuous income variable, as well as a created grouped income variable. Results from these datasets are compared and conclusions made on the use of continuous or grouped income variables. Covariate analysis is also applied in the form of biplots. A brief overview of biplots is given and it is then used to obtain a graphical description of the data and identify any patterns. Lastly, the conclusions made in this study are put forward and some future research is mentioned.
- ItemAnalysis to indicate the impact Hindsight Bias have on the outcome when forecasting of stock in the South African equity market(Stellenbosch : Stellenbosch University, 2023-12) Heyneke, Anton Lafrass; Conradie, Willie; Alfeus, Mesias; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: A novel Artificial Neural Network (ANN) framework presented in this study has the ability to mimic the effect that cognitive biases, specifically hindsight bias has on the financial market. This study investigates how hindsight bias influences models and their outcomes. During this study the hindsight bias effect will be measured within a South African context. The decisions that people make when faced with uncertainty are characterized by heuristic judgments and cognitive biases. If these characteristics are systematic and confirmed through research and literature related to this topic, it would form a quintessential part to the explanation of the behaviour of financial markets. This research presents a methodology that could be used to model the impact of cognitive biases on the financial markets. In this study, an ANN will be used as a stand-in for the decision-making process of an investor. It is important to note that the selection of the companies, on which the ANN will be trained, validated and tested, demonstrated cognitive bias during the study's preparation. Though there are many cognitive biases that have been identified in the literature on behavioural finance, this study will concentrate solely on the impact of hindsight bias. On financial markets, hindsight bias manifests when outcomes seem more predictable after they have already happened. This study attempts and succeeds – to some degree - to replicate the return characteristics of the ten chosen companies for the assessment period from 2010 to 2021. The study described here may still be subject to various cognitive biases and systemic behavioural errors in addition to the hindsight bias. The further application of this technique will stimulate further research with respect to the influence of investor behaviour on financial markets.
- ItemThe application and testing of smart beta strategies in the South African market(Stellenbosch : Stellenbosch University, 2018-03) Viljoen, Jacobus; Conradie, W. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: Smart Beta portfolios have recently prompted great interest both from academic researchers and market practitioners. Investors are attracted by the performances produced by these portfolios compared to the traditional market capitalisation weighted indices. The question that this thesis attempts to answer is: Do smart beta portfolios outperform the traditional cap-weighted indices in the South African market? According to BlackRock’s smart beta guide (Ang, 2015), the smart beta strategies aim to capture stock return drivers through rules-based, transparent strategies. They are generally long only and usually implemented within an asset class, in the case of this assignment, only equity. Smart beta is thus an investment strategy that positions itself between active and passive investing. Smart beta strategies are active in the sense that they invest in factors that drive return to improve risk-adjusted returns. In the same way, these strategies are closely related to passive strategies in that they are transparent, systematic and rules based. In this assignent five different fundamental factor portfolios (value, quality, momentum, volatility and a combination of the four, called multi-factor) were created based on the smart beta methodology. The factors that were used are well researched in the market and have been proven to provide investors with excess return over the market. Firstly, stock selection was done using two different techniques (time series comparison and cross-sectional comparison). The best stocks were selected based on their fundamental factor characteristics. Secondly, two different smart beta weighting strategies as well as a market-cap weighting strategy were applied to the selected stocks in order to create the various portfolios. The risk and return characteristics of the created portfolios were compared to those of the two benchmarks (JSE All Share Index and the JSE Shareholder Weighted All Share Index). The smart beta portfolios created in this thesis outperformed the benchmarks as well as the market-cap weighted portfolios. Lastly, the estimation of the macroeconomic exposure of the smart beta portfolios using a methodology outlined in a Citi Research paper is presented (Montagu, Krause, Burgess, Jalan, Murray, Chew and Yusuf., 2015).
- ItemAn application of copulas to improve PCA biplots for multivariate extremes(Stellenbosch : Stellenbosch University, 2018-12) Perrang, Justin; Van der Merwe, Carel Johannes; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Principal Component Analysis (PCA) biplots is a valuable means of visualising high dimensional data. The application of PCA biplots over a wide variety of research areas containing multivariate data is well documented. However, the application of biplots to financial data is limited. This is partly due to PCA being an inadequate means of dimension reduction for multivariate data that is subject to extremes. This implies that its application to financial data is greatly diminished since extreme observations are common in financial data. Hence, the purpose of this research is to develop a method to accommodate PCA biplots for multivariate data containing extreme observations. This is achieved by fitting an elliptical copula to the data and deriving a correlation matrix from the copula parameters. The copula parameters are estimated from only extreme observations and as such the derived correlationmatrices contain the dependencies of extreme observations. Finally, applying PCA to such an “extremal” correlation matrix more efficiently preserves the relationships underlying the extremes and a more refined PCA biplot can be constructed.
- ItemAn application of geometric data analysis techniques to South African crime data(Stellenbosch : Stellenbosch University, 2016-12) Gurr, Benjamin William; Le Roux, Niel J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics & Actuarial Science.ENGLISH SUMMARY : Due to the high levels of violent crime in South Africa, improved methods of analysis are required in order to better scrutinize these statistics. This study diverges from traditional multivariate data analysis, and provides alternative methods for analyzing crime data in South Africa. This study explores the applications of several types of geometric data analysis (GDA) methods to the study of crime in South Africa, these include: correspondence analysis, the correspondence analysis biplot, and the log-ratio biplot. Chapter 1 discusses the importance of data visualization in modern day statistics, as well as the geometric data analysis and its role as a multivariate analytical tool. Chapter 2 provides the motivation for the choice of subject matter to be explored in this study. As South Africa is recognized as having the eighth highest homicide rate in the world, along with a generally high level of violent crime, the analysis is conducted on reported violent crime statistics in South Africa. Additionally, the possible data collection challenges are also discussed in Chapter 2. The study is conducted on the violent crime statistics in South Africa for the 2004-2013 reporting period, the structure and details of which are discussed in Chapter 3. In order for this study to be comparable, it is imperative that the definitions of all crimes included are well defined. Chapter 3 places a large emphasis on declaring the exact definition of the various crimes which are utilized in this study, as recorded by the South African Police Services. The more common approaches to graphically representing crime data in South Africa are explored in Chapter 4. Chapter 4 also marks the beginning of the analysis of the South African crime data for the 2004-2013 reporting period. Univariate graphical techniques are used to analyze the data (line graphs and bar plots) for the 2004-2013 time period. However, as it is to be expected, they are hampered by serious limitations. In an attempt to improve on the analysis, focus is shifted to geometric data analysis techniques. The general methodologies to correspondence analysis, biplots, and correspondence analysis biplots are discussed in Chapter 5. Both the algorithms and the construction of the associated figures are discussed for the aforementioned methods. The application of these methodologies are implemented in Chapter 6. The results of Chapter 6 suggest some improvement upon the results of Chapter 4. These techniques provided a geometric setting where both the crimes and provinces could be represented in a single diagram, and where the relationships between both sets of variables could be analyzed. The correspondence analysis biplot proved to have some advantages in comparison to the correspondence analysis maps, as it can display numerous metrics, provide multiple calibrated axes, and allows for greater manipulation of the figure itself. Chapter 7 introduced the concept of compositional data and the log-ratio biplot. The log-ratio biplot combined the functionality of the biplot, along with a comparability measure in terms of a ratio. The log-ratio biplot proved useful in the analysis of the South African crime data as it expressed differences on a ratio scale as multiplicative differences. Additionally, log-ratio analysis has the property of being sub-compositionally coherent. Chapter 8 provides the summary and conclusions of this study. It was found that Gauteng categorically has the largest number of reported violent crimes over the reported period (2004-2013). However, the Western Cape proved to have the highest violent crime rates per capita of all the South African provinces. It was noted that over the past decade South Africa has experienced a downward trend in the number of reported murders. However, there has been a spike in the number of reported cases of murder in more recent year. This is spike is mostly driven by the large increases in reported murder cases in the Western Cape, Gauteng and KwaZulu-Natal. The most notable trend seen in the South African crime data is the rapid increase in the number of reported cases of drug-related crimes over the reported period across all provinces, but more noticeably in the Western Cape and Gauteng. On a whole, a majority of the South African provinces share similar violent crime profiles, however, Gauteng and the Western Cape deviate away from other provinces. This is due to Gauteng’s high association to robbery with aggravating circumstances and the Western Cape’s high association to drug-related crime. This study presents some evidence that the use of geometric data analysis techniques provides an improvement upon traditional reporting methods for the South African crime data. Geometric data analysis and its related methods should thus form an integral part of any study conducted into the topic at hand.
- ItemApplication of statistics and machine learning in healthcare(Stellenbosch : Stellenbosch University, 2019-04) Van der Merwe, Schalk Gerhardus; Muller, Chris; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Clinical performance and cost efficiency are key focus areas in the healthcare industry, since providing quality and affordable healthcare is a continuing challenge. The goal of this research is to use statistical analyses and modelling to improve efficiency in healthcare by focussing on readmissions. Patients readmitted to hospital can indicate poor clinical care and have immense cost implications. It is advantageous if readmissions can be kept to a minimum. Generally, stakeholders view strategies to address the clinical performance of healthcare providers, such as readmission rate, as mainly clinical in nature. However, this study will investigate the potential role of machine learning in the improvement of clinical outcomes. This study defines machine learning as the identification of complex patterns (linear or non – linear) present in observed data, with the goal of predicting a certain outcome for new cases by mimicking the true underlying pattern in the population which led to the observed outcomes in the sample while throughout limiting rigid structural assumptions. The question at hand is whether patients that are at risk of readmission can be identified, along with the risk factors that can be associated with an increase in the likelihood of the event of readmission occurring. If yes, this can provide an opportunity to reduce the number of readmissions and thus avoid the resulting cost and clinical consequences. Once identified as a patient at risk for readmission, it will provide an opportunity for early clinical intervention. In addition, the model will provide the opportunity to calculate risk scores for patients, which in turn will enable risk adjustment of the readmissions rates reported. The data under consideration in this study is healthcare data generated by the operations of an international healthcare provider, Mediclinic International. The data that the research is based on is patient data captured on hospital level in all Mediclinic hospitals, operational in Mediclinic International’s Southern African platform. Several statistical algorithms exist to model the responses of interest. The techniques consist of simple, well known techniques, as well as techniques that are more advanced. Logistic regression and decision trees are examples of simple techniques, while neural networks and support vector machines (SVM) are more complex. SAS Enterprise Guide is the software of choice for the data preparation, while SAS Enterprise Miner is the software used for the machine learning component of this study. The study aims to provide insight into machine learning techniques, as well as construct machine learning models that produce reasonable accuracy in terms of prediction of readmissions.
- ItemApplication of the moving block bootstrap method to resampled efficiency : the impact of the choice of block size(Stellenbosch : Stellenbosch University, 2021-12) Retief, Jan; Conradie, W. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Modern Portfolio theory was first developed in the 1950’s and revolutionised the way in which financial information is used to construct portfolios. Unfortunately, the theory is limited by the sensitivity of the constructed portfolio’s weights to uncertainty in the constituent’s risk and return estimates. Various advancements to the classical theory have been proposed to address this problem. One of these methods is called Resampled Efficiency (RE), which addresses the sensitivity problem by sampling expected return and risk estimates for each security included in the portfolio. Multiple portfolios are then built based on the sampled returns to construct a single averaged portfolio. The result is more robust portfolio’s that have been proven to have better out of sample performance. There are two methods available for sampling the security expected returns and risk: (1) generating random security returns (via Monte Carlo methods) or (2) using bootstrapping techniques based on observed security returns. For the second method, the moving block bootstrap (MBB) method can be used to construct bootstrapped samples for a non-stationary series of security returns. The MBB method works by ordering the historical series of observed returns into a pre-defined number of blocks (block sizes). As such, the choice of block size can have a significant effect on the sample that is obtained and used for portfolio construction. The goal of this study was to fully investigate what impact the choice of block size can have on the out of sample performance of resampled efficiency portfolios. After a literature review that assessed modern portfolio theory, resampled efficiency and the moving block bootstrap method, RE portfolios were hypothetically built based on actual security return observations. The constituents from the FTSE/JSE Top 40 index was used to construct RE portfolios for different choices of block sizes for the period between 2016 and 2017. The results indicate that the block size used can have a significant impact on the out of sample performance of the constructed portfolios, however no single block size or range of block sizes could be found that consistently result in the best performing RE portfolios. For different periods, and different levels of risk, the ideal block size differs.
- ItemThe appropriateness of ISDA SIMM for delta risk initial margin calculations in the South African over-the-counter interest rate swap market(Stellenbosch : Stellenbosch University, 2020-12) Cronje, Robert; Van der Merwe, Carel Johannes; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : This research assignment assesses the appropriateness of the calibrations in the ISDA SIMM for calculating delta risk initial margin (IM) in the current over-the-counter interest rate swap market in South Africa. Three main experiments are conducted that include novel ways of delineating and uncovering potential risks in the ISDA SIMM. By comparing the delta risk IM obtained using the standard model and that of a filtered historical simulation expected shortfall model that is calibrated to the South African swaps index curve, the IM appropriateness can be inspected for various profiles based on their relative sensitivities to the tenors of the swap curve. The experiments show that the ISDA SIMM is appropriate in most cases, but due to its broad calibrations, some shortfalls are shown to exist. The results are standardised throughout and are independent of absolute size, as liquidity and concentration features are deliberately excluded. This makes the results more generally applicable and also makes all the results obtained in the analyses comparable. The framework developed here can be replicated by practitioners using their own systems in order to obtain results that meet their internal calibrations as well as their specific risk and return requirements.
- ItemAspects of copulas and goodness-of-fit(Stellenbosch : Stellenbosch University, 2008-12) Kpanzou, Tchilabalo Abozou; De Wet, Tertius; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.The goodness-of- t of a statistical model describes how well it ts a set of observations. Measures of goodness-of- t typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, for example to test for normality, to test whether two samples are drawn from identical distributions, or whether outcome frequencies follow a speci ed distribution. Goodness-of- t for copulas is a special case of the more general problem of testing multivariate models, but is complicated due to the di culty of specifying marginal distributions. In this thesis, the goodness-of- t test statistics for general distributions and the tests for copulas are investigated, but prior to that an understanding of copulas and their properties is developed. In fact copulas are useful tools for understanding relationships among multivariate variables, and are important tools for describing the dependence structure between random variables. Several univariate, bivariate and multivariate test statistics are investigated, the emphasis being on tests for normality. Among goodness-of- t tests for copulas, tests based on the probability integral transform, Rosenblatt's transformation, as well as some dimension reduction techniques are considered. Bootstrap procedures are also described. Simulation studies are conducted to rst compare the power of rejection of the null hypothesis of the Clayton copula by four di erent test statistics under the alternative of the Gumbel-Hougaard copula, and also to compare the power of rejection of the null hypothesis of the Gumbel-Hougaard copula under the alternative of the Clayton copula. An application of the described techniques is made to a practical data set.
- ItemAspects of multi-class nearest hypersphere classification(Stellenbosch : Stellenbosch University, 2017-12) Coetzer, Frances; Lamont, Morné Michael Connell; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Using hyperspheres in the analysis of multivariate data is not a common practice in Statistics. However, hyperspheres have some interesting properties which are useful for data analysis in the following areas: domain description (finding a support region), detecting outliers (novelty detection) and the classification of objects into known classes. This thesis demonstrates how a hypersphere is fitted around a single dataset to obtain a support region and an outlier detector. The all-enclosing and 𝜐-soft hyperspheres are derived. The hyperspheres are then extended to multi-class classification, which is called nearest hypersphere classification (NHC). Different aspects of multi-class NHC are investigated. To study the classification performance of NHC we compared it to three other classification techniques. These techniques are support vector machine classification, random forests and penalised linear discriminant analysis. Using NHC requires choosing a kernel function and in this thesis, the Gaussian kernel will be used. NHC also depends on selecting an appropriate kernel hyper-parameter 𝛾 and a tuning parameter 𝐶. The behaviour of the error rate and the fraction of support vectors for different values of 𝛾 and 𝐶 will be investigated. Two methods will be investigated to obtain the optimal 𝛾 value for NHC. The first method uses a differential evolution procedure to find this value. The R function DEoptim() is used to execute this. The second method uses the R function sigest(). The first method is dependent on the classification technique and the second method is executed independently of the classification technique.
- ItemAspects of some exotic options(Stellenbosch : University of Stellenbosch, 2007-12) Theron, Nadia; Conradie, W. J.; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.The use of options on various stock markets over the world has introduced a unique opportunity for investors to hedge, speculate, create synthetic financial instruments and reduce funding and other costs in their trading strategies. The power of options lies in their versatility. They enable an investor to adapt or adjust her position according to any situation that arises. Another benefit of using options is that they provide leverage. Since options cost less than stock, they provide a high-leverage approach to trading that can significantly limit the overall risk of a trade, or provide additional income. This versatility and leverage, however, come at a price. Options are complex securities and can be extremely risky. In this document several aspects of trading and valuing some exotic options are investigated. The aim is to give insight into their uses and the risks involved in their trading. Two volatility-dependent derivatives, namely compound and chooser options; two path-dependent derivatives, namely barrier and Asian options; and lastly binary options, are discussed in detail. The purpose of this study is to provide a reference that contains both the mathematical derivations and detail in valuating these exotic options, as well as an overview of their applicability and use for students and other interested parties.
- ItemA Bayesian extreme value approach to the optimal reinsurance problem in a multivariate risk setting(Stellenbosch : Stellenbosch University, 2023-12) Steenkamp, Shaun Francois; Harvey, Justin; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: This thesis investigates a Bayesian extreme value theory approach to analyse the optimal reinsurance problem, more specifically the optimal layer selection of an excess of loss reinsurance contract. This thesis suggests a simulation approach to the optimization of the layer selection. This thesis proposes a multivariate excess of loss (XL) reinsurance structure, referred to as the simultaneous XL reinsurance structure and applies the developed optimization algorithm to this structure in several numerical examples. The approach takes a particular focus on extreme risks, thereby investigating the optimal reinsurance contract that best protects the insurance company from rare large claims. The methodology is explained for a univariate risk case, thereafter the model is extended to the bivariate and the multivariate risk cases. The optimal reinsurance agreement can be investigated using a variety of different models. This thesis develops a risk measure minimization model, with a focus on the conditional tail expectation (CTE) riskmeasure. The model allows for the insurance company’s reinsurance budget as a constraint in the optimization problem. Bayesian techniques are especially useful in problems where data is sparse, therefore this thesis suggests utilizing a Bayesian approach to the optimal reinsurance problem where rare large claims are considered. A Bayesian extreme value theory approach could improve the process of investigating the optimal reinsurance problem by utilising Markov Chain Monte Carlo (MCMC) methods to supplement the information from the data that the insurance company has available. The approach is extended into the bivariate and multivariate risk cases where a fictitious insurer, involved in various lines of business is considered. The dependence structure is modelled using a copula approach. Numerical examples are examined, and the results are interpreted. This thesis takes a focus on the tail of the data, thereby evaluating the optimal excess of loss reinsurance contract for very large claims with very small probabilities. The research suggests an algorithm for evaluating the optimal reinsurance strategy in a multivariate risk environment for insurance companies involved in different lines of business. The analysis will improve understanding and assist decision making on the reinsurance strategy from the insurer’s perspective.
- ItemBayesian machine learning : theory and applications(Stellenbosch : Stellenbosch University, 2020-12) Payne, Megan Wendy; Harvey, Justin; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Machine learning problems in general are concerned with the ability of different methods and algorithms to extract useful and interpretable information from large datasets, possibly ones which are corrupt due to noisy measurements or errors in data capturing. As the size and complexity of data increases, the demand for efficient and robust machine learning techniques is greater than ever. All statistical techniques can be divided into either a frequentist approach or a Bayesian approach depending on how probability is interpreted and how the unknown parameter set is treated. Bayesian methods have been present for several centuries; however, it was the advent of improved computational power and memory storage that catalysed the use of Bayesian modelling approaches in a wider range of scientific fields. This is largely due to many Bayesian methods requiring the computation of complex integrals, sometimes ones that are analytically intractable to compute in closed form, now being more accessible for use since approximation methods are less time-consuming to execute. This thesis will consider a Bayesian approach to statistical modelling and takes the form of a postgraduate course in Bayesian machine learning. A comprehensive overview of several machine learning topics are covered from a Bayesian perspective and, in many cases, compared with their frequentist counterparts as a means of illustrating some of the benefits that arise when making use of Bayesian modelling. The topics covered are focused on the more popular methods in the machine learning literature. Firstly, Bayesian approaches to classification techniques as well as a fully Bayesian approach to linear regression are discussed. Further, no discussion on machine learning methods would be complete without consideration of variable selection techniques, thus, a range of Bayesian variable selection and sparse Bayesian learning methods are considered and compared. Finally, probabilistic graphical models are presented since these methods form an integral part of Bayesian artificial intelligence. Included with the discussion of each technique is a practical implementation. These examples are all easily reproducible and demonstrate the performance of each method. Where applicable, a comparison of the Bayesian and frequentist methods are provided. The topics covered are by no means exhaustive of the Bayesian machine learning literature but rather provide a comprehensive overview of the most commonly encountered methods.
- ItemBinary classification trees : a comparison with popular classification methods in statistics using different software(Stellenbosch : Stellenbosch University, 2002-12) Lamont, Morné Michael Connell; Louw, N.; Stellenbosch University. Faculty of Economic and Management Sciences. Department of Statistics and Actuarial Science.ENGLISH ABSTRACT: Consider a data set with a categorical response variable and a set of explanatory variables. The response variable can have two or more categories and the explanatory variables can be numerical or categorical. This is a typical setup for a classification analysis, where we want to model the response based on the explanatory variables. Traditional statistical methods have been developed under certain assumptions such as: the explanatory variables are numeric only and! or the data follow a multivariate normal distribution. hl practice such assumptions are not always met. Different research fields generate data that have a mixed structure (categorical and numeric) and researchers are often interested using all these data in the analysis. hl recent years robust methods such as classification trees have become the substitute for traditional statistical methods when the above assumptions are violated. Classification trees are not only an effective classification method, but offer many other advantages. The aim of this thesis is to highlight the advantages of classification trees. hl the chapters that follow, the theory of and further developments on classification trees are discussed. This forms the foundation for the CART software which is discussed in Chapter 5, as well as other software in which classification tree modeling is possible. We will compare classification trees to parametric-, kernel- and k-nearest-neighbour discriminant analyses. A neural network is also compared to classification trees and finally we draw some conclusions on classification trees and its comparisons with other methods.
- ItemBiomedical image analysis of brain tumours through the use of artificial intelligence(Stellenbosch : Stellenbosch University, 2022-04) Di Santolo, Claudia; Muller, C. J. B.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: Cancer is one of the leading causes of morbidity and mortality on a global scale. More specifically, cancer of the brain, which is one of the rarest forms. One of the major challenges is that of timely diagnoses. In the ongoing fight against cancer early and accurate detection in combination with effective treatment strategy planning remains one of the best tools for improved patient outcomes and success. Emphasis has been placed on the identification and classification of brain lesions in patients - that is, either the absence or presence of brain tumours. In the case of malignant brain tumours it is critical to classify patients into either high-grade or low-grade brain lesion groups: different gradings of brain tumours have different prognoses, thus different survival rates. The growth in the availability and accessibility of big data due to digitisation has led individuals in the area of bioinformatics in both academia and industry to apply and evaluate artificial intelligence techniques. However, one of the most important challenges, not only in the field of bioinformatics but also in other realms, is transforming the raw data into valuable insights and knowledge. In this research thesis artificial intelligence techniques that can detect vital and fundamental underlying patterns in the data are reviewed. The models may provide significant predictive performance to assist with decision making. Much artificial intelligence has been applied to brain tumour classification and segmentation in the research literature. However, in this study the theoretical background of two more traditional machine learning methods, namely 𝑘-nearest neighbours and support vector machines, is discussed. In recent years, deep learning (artificial neural networks) has gained prominence due to its ability to handle copious amounts of data. The specialised version of the artificial neural network that is reviewed is convolutional neural networks. The rationale behind this particular technique is that it is applied to visual imagery. In addition to making use of the convolutional neural network architecture, the study reviews the training of neural networks that involves the use of optimisation techniques, considered to be one of the most difficult parts. Utilising only one learning algorithm (optimisation technique) in the architecture of convolutional neural network models for classification tasks may be regarded as insufficient unless there is strong support in the design of the analysis for using a particular technique. Nine state-of-the-art optimisation techniques formed part of a comparative study to determine if there was any improvement in the classification and segmentation of high-grade or low-grade brain tumours. These machine learning and deep learning techniques have proved to be successful in image classification and - more relevant to this research – brain tumours. To supplement the theoretical knowledge, these artificial intelligence methodologies (models) are applied through the exploration of magnetic resonance imaging scans of brain lesions.
- Itembipl5 : an R package for reactive calibrated axes PCA biplots(Stellenbosch : Stellenbosch University, 2024-03) Buys, Ruan; van der Merwe, C. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: Principal component analysis biplots with calibrated axes are popular and effective multivariate data visualisation tools. Biplots are however often complex to navigate due to cluttered plotting in the central data area, as well as die limitations that accompany static rendering. The bipl5 package proposes three contributions to the biplot display: i) automated orthogonal parallel translation of the axes to the boundary of the plot and declutter the plot center; ii) superimpose interclass kernel densities on each axis to investigate class distributions in the data; iii) render the final plot on a portable and standalone HTML file with embedded reactivity. This article considers the mathematical and computational implementation of bipl5, and showcases its functionality through an illustrative example.
- ItemA brief introduction to basic multivariate economic statistical process control(Stellenbosch : Stellenbosch University, 2012-12) Mudavanhu, Precious; Van Deventer, P. J. U.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: Statistical process control (SPC) plays a very important role in monitoring and improving industrial processes to ensure that products produced or shipped to the customer meet the required specifications. The main tool that is used in SPC is the statistical control chart. The traditional way of statistical control chart design assumed that a process is described by a single quality characteristic. However, according to Montgomery and Klatt (1972) industrial processes and products can have more than one quality characteristic and their joint effect describes product quality. Process monitoring in which several related variables are of interest is referred to as multivariate statistical process control (MSPC). The most vital and commonly used tool in MSPC is the statistical control chart as in the case of the SPC. The design of a control chart requires the user to select three parameters which are: sample size, n , sampling interval, h and control limits, k.Several authors have developed control charts based on more than one quality characteristic, among them was Hotelling (1947) who pioneered the use of the multivariate process control techniques through the development of a 2 T -control chart which is well known as Hotelling 2 T -control chart. Since the introduction of the control chart technique, the most common and widely used method of control chart design was the statistical design. However, according to Montgomery (2005), the design of control has economic implications. There are costs that are incurred during the design of a control chart and these are: costs of sampling and testing, costs associated with investigating an out-of-control signal and possible correction of any assignable cause found, costs associated with the production of nonconforming products, etc. The paper is about giving an overview of the different methods or techniques that have been employed to develop the different economic statistical models for MSPC. The first multivariate economic model presented in this paper is the economic design of the Hotelling‟s 2 T -control chart to maintain current control of a process developed by Montgomery and Klatt (1972). This is followed by the work done by Kapur and Chao (1996) in which the concept of creating a specification region for the multiple quality characteristics together with the use of a multivariate quality loss function is implemented to minimize total loss to both the producer and the customer. Another approach by Chou et al (2002) is also presented in which a procedure is developed that simultaneously monitor the process mean and covariance matrix through the use of a quality loss function. The procedure is based on the test statistic 2ln L and the cost model is based on Montgomery and Klatt (1972) as well as Kapur and Chao‟s (1996) ideas. One example of the use of the variable sample size technique on the economic and economic statistical design of the control chart will also be presented. Specifically, an economic and economic statistical design of the 2 T -control chart with two adaptive sample sizes (Farazet al, 2010) will be presented. Farazet al (2010) developed a cost model of a variable sampling size 2 T -control chart for the economic and economic statistical design using Lorenzen and Vance‟s (1986) model. There are several other approaches to the multivariate economic statistical process control (MESPC) problem, but in this project the focus is on the cases based on the phase II stadium of the process where the mean vector, and the covariance matrix, have been fairly well established and can be taken as known, but both are subject to assignable causes. This latter aspect is often ignored by researchers. Nevertheless, the article by Farazet al (2010) is included to give more insight into how more sophisticated approaches may fit in with MESPC, even if the mean vector, only may be subject to assignable cause. Keywords: control chart; statistical process control; multivariate statistical process control; multivariate economic statistical process control; multivariate control chart; loss function.