Doctoral Degrees (Statistics and Actuarial Science)

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 23
  • Item
    A quantitative analysis of investor over-reaction and under-reaction in the South African Equity Market : a mathematical statistical approach
    (Stellenbosch : Stellenbosch University, 2022-04) Mbonda Tiekwe, Aude Ines; Conradie, Willie; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
    ENGLISH SUMMARY: One of the basic foundations of traditional finance is the theory underlying the efficient market hypothesis (EMH). The EMH states that stocks are fairly and accurately priced, making it impossible for investors to use stock selection, technical analysis, or market timing to out-perform the market by earning abnormal returns. Several schools of thought have challenged the EMH by presenting empirical evidence of market anomalies, which seems to contradict the EMH. One such school of thought is behavioural finance, which holds that investors over-react and/or under-react over time, driven by their behavioural biases. The Barberis et al. (1998) theory of conservatism and representativeness heuristics is used to explain investor over-reaction and under-reaction. Investors who exhibit conservatism are slow to update their beliefs in response to recent evidence, and thus under-react to information. Under the influence of the representativeness heuristics, investors tend to produce extreme predictions, and over-react, implying that stocks that under-performed in the past tend to out-perform in the future, and vice-versa (Aguiar et al., 2006). In this study, it is investigated whether South African investors tend to overreact and/or under-react over time, driven by their behavioural biases. The 100 shares with the largest market capitalisation at the end of every calendar year from 2006 to 2016 were considered for the study. These shares had sufficient liquidity and depth of coverage by analysts and investors to be considered for a study on behavioural finance. In total, a sample of 163 shares had sufficient financial statement data on the Iress and Bloomberg databases to be included in the study. Analyses were done using two mathematical statistical techniques i.e. the more mathematical Fuzzy C-Means model and the Bayesian model, together with formal statistical tests. The Fuzzy C-Means model is based on the technique of pattern recognition, and uses the well-known fuzzy c-means clustering algorithm. The Bayesian model is based on the classical Bayes’ theorem, which describes a relationship between the probability of an event conditional upon another event. The stocks in the financials-, industrial- and resources sectors were analysed separately. Over-reaction and under-reaction were both detected, and differed across the three sectors. No clear patterns of the two biases investigated were visible over time. The results of the Fuzzy C-Means model analysis revealed that the resources sector shows the most under-reaction. In the Bayesian model, underreaction was observed more than over-reaction in the resources and industrial sectors. In the financial sector, over-reaction was observed more often. The results of this study imply that a momentum and a contrarian investment strategy can lead to over-performance in the South African equity market, but can also generate under-performance in a poorly performing market. Therefore, no trading strategies can be advised based on the results of this study.
  • Item
    Feature selection for multi-label classification
    (Stellenbosch : Stellenbosch University, 2020-12) Contardo-Berning, Ivona E.; Steel, S. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Economics.
    ENGLISH ABSTRACT : The field of multi-label learning is a popular new research focus. In the multi-label setting, a data instance can be associated simultaneously with a set of labels instead of only a single label. This dissertation reviews the subject of multi-label classification, emphasising some of the notable developments in the field. The nature of multi-label datasets typically means that these datasets are complex and dimensionality reduction might aid in the analysis of these datasets. The notion of feature selection is therefore introduced and discussed briefly in this dissertation. A new procedure for multi-label feature selection is proposed. This new procedure, relevance pattern feature selection (RPFS), utilises the methodology of the graphical technique of Multiple Correspondence Analysis (MCA) biplots to perform feature selection. An empirical evaluation of the proposed technique is performed using a benchmark multi-label dataset and synthetic multi-label datasets. For the benchmark dataset it is shown that the proposed procedure achieves results similar to the full model, while using significantly fewer features. The empirical evaluation of the procedure on the synthetic datasets shows that the results achieved by the reduced sets of features are better than those achieved with a full set of features for the majority of the methods. The proposed procedure is then compared to two established multi-label feature selection techniques using the synthetic datasets. The results again show that the proposed procedure is effective.
  • Item
    Extreme quantile inference
    (Stellenbosch : Stellenbosch University, 2020-03) Buitendag, Sven; De Wet, Tertius; Beirlant, Jan; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
    ENGLISH SUMMARY : A novel approach to performing extreme quantile inference is proposed by applying ridge regression and the saddlepoint approximation to results in extreme value theory. To this end, ridge regression is applied to the log differences of the largest sample quantiles to obtain a bias-reduced estimator of the extreme value index, which is a parameter in extreme value theory that plays a central role in the estimation of extreme quantiles. The utility of the ridge regression estimators for the extreme value index is illustrated by means of simulations results and applications to daily wind speeds. A new pivotal quantity is then proposed with which a set of novel asymptotic confidence intervals for extreme quantiles are obtained. The ridge regression estimator for the extreme value index is combined with the proposed pivotal quantity together with the saddlepoint approximation to yield a set of confidence intervals that are accurate and narrow. The utility of these confidence intervals are illustrated by means of simulation results and applications to Belgian reinsurance data. Multivariate generalizations of sample quantiles are considered with the aim of developing multivariate risk measures, including maximum correlation risk measures and an estimator for the extreme value index. These multivariate sample quantiles are called center-outward quantiles, and are defined as an optimal transportation of the uniformly distributed points in the unit ball Sd to the observed sample points in Rd. A continuous extension of the centeroutward quantile is proposed, which yields quantile contours that are nested. Furthermore, maximum correlation risk measures for multivariate samples are presented, as well as an estimator for the extreme value index for multivariate regularly varying samples. These results are applied to Danish fire insurance data and the stock returns of Google and Apple share prices to illustrate their utility.
  • Item
    Classifying yield spread movements in sparse data through triplots
    (Stellenbosch : Stellenbosch University, 2020-03) Van der Merwe, Carel Johannes; De Wet, Tertius; Inghelbrecht, Koen; Vanmaele, Michele; Conradie, W. J. (Willem Johannes); Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
    ENGLISH SUMMARY : In many developing countries, including South Africa, all data that are required to calculate the fair values of financial instruments are not always readily available. Additionally, in some instances, companies who do not have the necessary quantitative skills are reluctant to incorporate the correct fair valuation by failing to employ the appropriate techniques. This problem is most notable with regards to unlisted debt instruments. There are two main inputs with regards to the valuation of unlisted debt instruments, namely the the risk-free curve and the the yield spread. Investigation into these two components forms the basis of this thesis. Firstly, an analysis is carried out to derive approximations of risk-free curves in areas where data is sparse. Thereafter it is investigated whether there is sufficient evidence of a significant change in yield spreads of unlisted debt instruments. In order to determine these changes, however, a new method that allows for simultaneous visualisation and classification of data was developed - termed triplot classification with polybags. This new classification technique also has the ability to limit misclassification rates. In the first paper, a proxy for the extended zero curve, calculated from other observable inputs, is found through a simulation approach by incorporating two new techniques, namely permuted integer multiple linear regression and aggregate standardised model scoring. It was found that a Nelson Siegel fit, with a mixture of one year forward rates as proxies for the long term zero point, and some discarding of initial data points, performs relatively well in the training and testing data sets. This new method allows for the approximation of risk-free curves where no long term points are available, and further allows for the determinants of the yield curve shape by considering other available data. The changes in these shape determining parameters are used in the final paper as determinants for changes in yield spreads. For the second paper, a new classification technique is developed that was used in the final paper. Classification techniques do not easily allow for visual interpretation, nor do they usually allow for the limitation of the false negative and positive error rates. For some areas of research and practical applications these shortcomings are important to address. In this paper, classification techniques are combined with biplots, allowing for simultaneous visual representation and classification of the data, resulting in the so-called triplot. By further incorporating polybags, the ability of limiting misclassification type errors is also introduced. A simulation study as well as an application is provided showing that the method provides similar results compared to existing methods, but with added visualisation benefits. The paper focuses purely on developing a statistical technique that can be applied to any field. The application that is provided, for example, is on a medical data set. In the final paper the technique is applied to changes in yield spreads. The third paper considered changes in yield spreads which were analysed through various covariates to determine whether significant decreases or increases would have been observed for unlisted debt instruments. The methodology does not specifically determine the new spread, but gives evidence on whether the initial implied spread could be left the same, or whether a new spread should be determined. These yield spread movements are classified using various share, interest rate, financial ratio, and economic type covariates in a visually interpretive manner. This also allows for a better understanding of how various factors drive the changes in yield spreads. Finally, as supplement to each paper, a web-based application was built allowing the reader to interact with all the data and properties of the methodologies discussed. The following links can be used to access these three applications: - Paper 1: https://carelvdmerwe.shinyapps.io/ProxyCurve/ - Paper 2: https://carelvdmerwe.shinyapps.io/TriplotSimulation/ - Paper 3: https://carelvdmerwe.shinyapps.io/SpreadsTriplot/
  • Item
    Biplot methodology for analysing and evaluating missing multivariate nominal scaled data
    (Stellenbosch : Stellenbosch University, 2019-12) Nienkemper-Swanepoel, Johane; Le Roux, N. J.; Lubbe, Sugnet; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
    ENGLISH ABSTRACT: This research aims at developing exploratory techniques that are specifically suitable for missing data applications. Categorical data analysis, missing data analysis and biplot visualisation are the three core methodologies that are combined to develop novel techniques. Variants of multiple correspondence analysis (MCA) biplots are used for all visualisations. The first study objective addresses exploratory analysis after multiple imputation (MI). Multiple plausible values are imputed for each missing observation to construct multiple completed data sets for standard analyses. Biplot visualisations are constructed for each completed data set after MI which require individual exploration to obtain final inference. The number of MIs will greatly affect the accuracy and consistency of the interpretations obtained from several plots. This predicament led to the development of GPAbin, to optimally combine configurations from MIs to obtain a single configuration for final inference. The GPAbin approach advances from two statistical techniques: generalised orthogonal Procrustes analysis (GPA) and the combining rules used to combine estimates obtained from MIs, Rubin’s rules. Albeit a superior missing data handling approach, MI could be daunting for the non‐technical practitioner. Therefore, an adequate alternative approach could be appealing and contribute to the variety of available methods for the handling of incomplete multivariate categorical data. The second objective aims at confirming whether visualisations obtained from nonimputed data sets are a suitable alternative to visualisations obtained from MIs. Subset MCA (sMCA) distinguishes between observed and missing subsets of a multivariate categorical data set by creating an additional response category level (CL) for missing responses in the indicator matrix. Missing and observed responses can be visualised separately by only considering the subset of interest in the recoded indicator matrix. The visualisation of the observed responses utilises all available information which would have been forfeited by deletion methods. The third study objective explores the possibility of predicting a complete multivariate categorical data set from MI visualisations obtained from the first study objective. The distances between the coordinates of a biplot in the full space are used to predict plausible responses. Since the aim of this research is to advance missing data visualisations, the visualisations obtained from predicted completed data sets are compared to visualisations of simulated complete data sets. The emphasis is on preserving inference and not recreating the original data. Missing data techniques are typically developed to address a specific missing data problem. It is therefore crucial to understand the cause of missingness in order to apply suitable missing data techniques. The fourth study objective investigates the sMCA biplot of the missing subset of the recoded indicator matrix. Configurations of the incomplete subsets enable the recognition of non‐response patterns which could provide insight into the particular missing data mechanism (MDM). The missing at random (MAR) MDM refers to missing responses that are dependent on the observed information and is expected to be identified by patterns and groupings occurring in the incomplete sMCA biplot. The missing completely at random (MCAR) MDM states that all observations have the same probability of not being captured which could be identified by a random cloud of points in the incomplete sMCA biplot. Cluster analysis is applied to confirm distinguishable groupings in the incomplete sMCA biplot which could be used as a guideline to identify the MDM. The proposed methodologies to address the different study objectives are evaluated by means of an extensive simulation study comprising of various sample sizes, variables and varying number of CLs which are simulated from three different distributions. The findings of the simulation study are applied to a real data set to aid as a guide for the analysis. Functions have been developed for R statistical software to perform all methodology presented in this research. It is included as a tool pack provided as an appendix to assist in the correct handling and unbiased visualisation of multivariate categorical data with missing observations. Keywords: biplots; categorical data; missing data; multiple correspondence analysis; multiple imputation; Procrustes analysis.