Improving the accuracy of prediction using singular spectrum analysis by incorporating internet activity

Badenhorst, Dirk Jakobus Pretorius (2013-03)

Thesis (MComm)--Stellenbosch University, 2013.

Thesis

ENGLISH ABSTRACT: Researchers and investors have been attempting to predict stock market activity for years. The possible financial gain that accurate predictions would offer lit a flame of greed and drive that would inspire all kinds of researchers. However, after many of these researchers have failed, they started to hypothesize that a goal such as this is not only improbable, but impossible. Previous predictions were based on historical data of the stock market activity itself and would often incorporate different types of auxiliary data. This auxiliary data ranged as far as imagination allowed in an attempt to find some correlation and some insight into the future, that could in turn lead to the figurative pot of gold. More often than not, the auxiliary data would not prove helpful. However, with the birth of the internet, endless amounts of new sources of auxiliary data presented itself. In this thesis I propose that the near in finite amount of data available on the internet could provide us with information that would improve stock market predictions. With this goal in mind, the different sources of information available on the internet are considered. Previous studies on similar topics presented possible ways in which we can measure internet activity, which might relate to stock market activity. These studies also gave some insights on the advantages and disadvantages of using some of these sources. These considerations are investigated in this thesis. Since a lot of this work is therefore based on the prediction of a time series, it was necessary to choose a prediction algorithm. Previously used linear methods seemed too simple for prediction of stock market activity and a new non-linear method, called Singular Spectrum Analysis, is therefore considered. A detailed study of this algorithm is done to ensure that it is an appropriate prediction methodology to use. Furthermore, since we will be including auxiliary information, multivariate extensions of this algorithm are considered as well. Some of the inaccuracies and inadequacies of these current multivariate extensions are studied and an alternative multivariate technique is proposed and tested. This alternative approach addresses the inadequacies of existing methods. With the appropriate methodology chosen and the appropriate sources of auxiliary information chosen, a concluding chapter is done on whether predictions that includes auxiliary information (obtained from the internet) improve on baseline predictions that are simply based on historical stock market data.

AFRIKAANSE OPSOMMING: Navorsers en beleggers is vir jare al opsoek na maniere om aandeelpryse meer akkuraat te voorspel. Die moontlike finansiële implikasies wat akkurate vooruitskattings kan inhou het 'n vlam van geldgierigheid en dryf wakker gemaak binne navorsers regoor die wêreld. Nadat baie van hierdie navorsers onsuksesvol was, het hulle begin vermoed dat so 'n doel nie net onwaarskynlik is nie, maar onmoontlik. Vorige vooruitskattings was bloot gebaseer op historiese aandeelprys data en sou soms verskillende tipes bykomende data inkorporeer. Die tipes data wat gebruik was het gestrek so ver soos wat die verbeelding toegelaat het, in 'n poging om korrelasie en inligting oor die toekoms te kry wat na die guurlike pot goud sou lei. Navorsers het gereeld gevind dat hierdie verskillende tipes bykomende inligting nie van veel hulp was nie, maar met die geboorte van die internet het 'n oneindige hoeveelheid nuwe bronne van bykomende inligting bekombaar geraak. In hierdie tesis stel ek dus voor dat die data beskikbaar op die internet dalk vir ons kan inligting gee wat verwant is aan toekomstige aandeelpryse. Met hierdie doel in die oog, is die verskillende bronne van inligting op die internet gebestudeer. Vorige studies op verwante werk het sekere spesifieke maniere voorgestel waarop ons internet aktiwiteit kan meet. Hierdie studies het ook insig gegee oor die voordele en die nadele wat sommige bronne inhou. Hierdie oorwegings word ook in hierdie tesis bespreek. Aangesien 'n groot gedeelte van hierdie tesis dus gebasseer word op die vooruitskatting van 'n tydreeks, is dit nodig om 'n toepaslike vooruitskattings algoritme te kies. Baie navorsers het verkies om eenvoudige lineêre metodes te gebruik. Hierdie metodes het egter te eenvoudig voorgekom en 'n relatiewe nuwe nie-lineêre metode (met die naam "Singular Spectrum Analysis") is oorweeg. 'n Deeglike studie van hierdie algoritme is gedoen om te verseker dat die metode van toepassing is op aandeelprys data. Verder, aangesien ons gebruik wou maak van bykomende inligting, is daar ook 'n studie gedoen op huidige multivariaat uitbreidings van hierdie algoritme en die probleme wat dit inhou. 'n Alternatiewe multivariaat metode is toe voorgestel en getoets wat hierdie probleme aanspreek. Met 'n gekose vooruitskattingsmetode en gekose bronne van bykomende data is 'n gevolgtrekkende hoofstuk geskryf oor of vooruitskattings, wat die bykomende internet data inkorporeer, werklik in staat is om te verbeter op die eenvoudige vooruitskattings, wat slegs gebaseer is op die historiese aandeelprys data.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/80056
This item appears in the following collections: