The importance of input features on deep neural networks when predicting foreign exchange rates

Date
2021-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Data has become a form of currency that can govern both the success and failure of almost every business, individual or idea. Raw, unprocessed financial data form the basis for new discoveries in this thesis. Since financial data is non-stationary and filled with noise, careful consideration is required when selecting forecasting models. Machine learning (ML) has grown from just a concept to a leading analysis/prediction tool used in almost every industry in the world. The immense volume of data generated, mined and collected is the fuel that keeps the interest and development of ML alive. Without data, ML would not be able to advance in the way that it has. However, a wealth of data does not imply that all data is relevant and/or important. Selecting input variables for ML models is vitally important.The effect and importance of input features was investigated on three different neural network(NN) architectures: a LSTM model, and two hybrid CNN-LSTM and Multi-Head CNN-LSTM models. Using the prediction accuracy, MSE, adjusting the accuracy threshold and classification accuracy, a comparison was done between the different tests, which used different input features, and the overall best performing NNs. The input features that were tested included: the open, high, low, close, 9-day and 21-day moving averages, the price difference, Relative Strength Index, Heikin Ashi, Ichimoku Kinko CLoud, bollinger Bands, 3, 6 and 9 month implied volatility and risk reversal,1st and 2nd differences and features determined through principal component analysis. As NNs are sensitive to network architectures, several architectures were also investigated for each input feature, thus allowing the opportunity for each test to find a possible optimal configuration. It was found that the Multi-Head model obtained the best overall prediction accuracy of 25.6% when Bollinger Bands were added to the baseline input features: open, high, low and closing price of the USD/ZAR exchange rate. However, the Multi-Head model was outperformed by Multiple Regression which obtained a prediction accuracy of 30.4% using features obtained through Principal Component Analysis — with a binary increase input feature having the greatest influence on predictions made.
AFRIKAANSE OPSOMMING: Data het ’n vorm van geldeenheid geword wat die sukses en mislukking van byna elke onderneming, individu of idee kan bepaal. Rou, onverwerkte finalsiele data vorm die basis vir die nuwe ontdekkings wat in hierdie tesis gemaak word. Aangesien finansiele gegewens nie-stasioner en gevul met geraas is, moet deeglike oorweging geskenk word aan die keuse van vooruitskattingsmodelle. Masjienleer (ML) het gegroei van net ’n konsep tot ’n toonaangewende hulpmiddel vir analise/voorspelling wat in byna elke bedryf in die wereld gebruik word. Die groot hoeveelheid data wat gegenereer, ontgin en versamel word, is die brandstof wat die belangstelling en ontwikkeling van ML lewendig hou. Sonder data sou ML nie kon vorder soos dit het nie. ’n Oorvloed data impliseer egter nie dat alle data relevant en/of belangrik is nie. Die keuse van invoerveranderlikes vir ML-modelle is van uiterste belang. Die effek en belangrikheid van invoereienskappe is ondersoek in drie verskillende neurale network argitekture (NN): ’n LSTM-model en twee hibriede CNN-LSTM- en meerkoppige CNN-LSTM-modelle. Met behulp van die maatstaf, gemiddelde fout kwadraat (MSE), die aanpassingvan die akkuraatheidsdrempel en klassifikasie-akkuraatheid, is ’n vergelyking gedoen tussen die verskillende toetse, wat verskillende invoereienskappe gebruik, en die algehele presatsie van NN’s. Die invoereienskappe wat getoets is, sluit in: die openings-, hoe lae en sluitingsprys, 9-dae en 21-dae bewegende gemiddeldes, die prysverskil, Relatiewe Sterkte Indeks, Heikin Ashi, IchimokuKinko CLoud, Bollinger Bande, 3, 6 en 9 maand geimpliseerde volatiliteit en risiko-omkering,1ste en 2de verskille en kenmerke wat bepaal word deur hoofkomponent analise. Aangesien NN’s sensitief is vir netwerkargitekture, is daar ook ondersoek ingestel na verskeie argitekture vir elke stel invoereienskappe, wat vir elke toets die geleentheid bied om ’n moontlike optimale konfigurasie te vind.Die meerkoppige model lewer die beste algehele voorspellingsakkuraatheid van 25.6%, toe Bollinger Bande by die basisinvoereienskappe gevoeg is, naamlik: openings-, hoe, lae en sluitingsprys van die USD/ZAR-wisselkoers. Die meerkoppige model is egter oortref deur meervoudige regressie, wat ’n voorspellingsakkuraatheid van 30.4% behaal het deur gebruik te maak van funksies wat verkry is deur hoofkomponent analise — met ’n binere toename-invoerfunksie wat die grootste invloed het op voorspellings wat gemaak is.
Description
Thesis (MComm)--Stellenbosch University, 2021.
Keywords
Neural networks (Computer science), Machine learning, Business networks, Corporations finance, Foreign exchange market, Foreign exchange options, UCTD
Citation