A hyperheuristic approach towards the training of artificial neural networks

Nel, Gerrit Stephanus (2021-03)

Thesis (PhD)--Stellenbosch University, 2021.

Thesis

ENGLISH ABSTRACT: In 2015, approximately 2.5 × 1018 bytes of data were generated on a daily basis. The enormity and nature of these data have laid bare the inadequacies of standard data analytic approaches. Researchers and practitioners have for long been unequipped with the necessary means to extract insight from the vast amounts of data at their disposal - until now, that is. Recent advances within the domain of artificial intelligence have ushered in a new era, providing the essential connective tissue between data and analysis. These advances can be attributed to instrumental research conducted within the field of machine learning, research that has provided algorithms with the inherent ability to learn. A groundbreaking algorithm at the forefront of the current machine learning impetus is the artificial neural network. Artificial neural networks are computational models inspired by biological neural networks. This process of neurological emulation enables artificial neural networks to gain an ability intrinsic to their muse - i.e. to learn from experience. A characteristic that distinguishes this algorithm from other machine learning algorithms is the efficiency and effectiveness with which it can recognise complex patterns and abstractions within data. The process according to which this algorithm recognises patterns from data is called training and is arguably its most intriguing facet. Conventionally, the method of gradient descent (or steepest ascent) is employed to find good network parameter values. A limitation is, however, imposed on the level of abstraction at which optimisation can thus transpire. A gradient-free approach offers a good alternative. More specifically, the research field of metaheuristics provides powerful optimisation techniques that are applicable in the context of training artificial neural networks. A metaheuristic optimisation approach allows for far greater freedom during artificial neural network training - the network weights, its structure, and its activation functions can be optimised concurrently. This versatility of metaheuristics, as well as their proven capability in many optimisation contexts, serves as justification for why they feature centrally in this dissertation. A challenge to all optimisation approaches, however, relates to the decision of which algorithm to employ for this purpose. Fortunately, the relatively new and promising field of hyperheuristics provides the necessary means to circumvent this challenge - a hyperheuristic is essentially a heuristic that chooses heuristics. The hyperheuristic considered in this dissertation is called the AMALGAM method. AMALGAM is a powerful and robust optimisation approach that delivers significant performance improvements (approaching a factor of ten), whilst enhancing the level of general applicability over various benchmark problems. This hyperheuristic has not been applied in the literature to the optimisation problem of training artificial neural networks in respect of their network weights, network structure, and activation functions concurrently. An AMALGAM-based hyperheuristic training algorithm is therefore proposed in this dissertation. The novelty of the problem under investigation, however, necessitates a new mathematical learning model. In addition, novel modifications in respect of AMALGAM are made so as to enable its use in neural network training. A bi-objective hyperheuristic training algorithm is designed, in which the main objective represents a novel network performance measure while a secondary so-called helper objective is incorporated to guide the search process. A test suite, comprising several data sets, is created in order to evaluate the efficacy of the proposed training algorithm. Three extensive parameter evaluations are performed so as to gain insight into algorithmic performance under different conditions. An in-depth algorithmic performance comparison is also performed during which the performance achieved by the proposed hyperheuristic training algorithm is compared with those of its constituent sub-algorithms. The robustness of the proposed approach is also validated by means of a meta-generalisation analysis. A comparison between the hyperheuristic training algorithm and powerful gradient-based training algorithms is performed which is supplemented by an investigation into the potential consolidation of the hyperheuristic approach with the best gradient-based algorithm. An in-depth investigation is launched into the temporal dynamics of the hyperheuristic's sub-algorithms with a view to gain new insight into this novel approach towards training artificial neural networks and to predict algorithmic performance. A demonstration of how the working of the hyperheuristic can be improved by means of the prediction model is also provided. The structural attributes related to favourable networks produced by the hyperheuristic are analysed with a view to gain new insight into the working of the hyperheuristic.

AFRIKAANSE OPSOMMING: In 2015 is daar ongeveer 2.5 × 1018 grepe data op 'n daaglikse basis gegenereer. Die omvang en aard van hierdie data het die tekortkominge van standaard data-analitiese benaderings blootgelê. Navorsers en praktisyns het lank nie oor die nodige middele beskik om insig uit die groot hoeveelhede data tot hulle beskikking, te verkry nie - tot nou toe. As gevolg van onlangse vordering binne die vakgebied van kunsmatige intelligensie het 'n nuwe era aanbreek wat die nodige bindweefsel tussen data en analise verskaf. Hierdie vordering kan toegeskryf word aan instrumentele navorsing in die gebied van masjienleer, navorsing waarin algoritmes wat die inherente vermoë het om te leer, die lig gesien het. 'n Baanbrekende algoritme aan die voorpunt van die huidige masjienleer-momentum is die kunsmatige neurale netwerk. Kunsmatige neurale netwerke is berekeningsmodelle wat deur biologiese neurale netwerke geïnspireer is. Hierdie proses van neurologiese nabootsing stel kunsmatige neurale netwerke in staat om 'n vermoë te ontwikkel wat eie is aan hul muse - naamlik om uit ervaring te leer. 'n Eienskap wat hierdie algoritme van ander masjienleeralgoritmes onderskei, is die doeltreffendheid en effektiwiteit waarmee dit komplekse patrone en abstraksies binne data kan herken. Die proses waarvolgens hierdie algoritme patrone uit data herken, word leer genoem en is waarskynlik die mees interessante faset daarvan. Gewoonlik word die gradient-dalingsmetode (of die steilste-hellingmetode) gebruik om goeie netwerkparameterwaardes te vind. Die vlak van abstraksie waarby optimering sodoende kan plaansind, is egter beperk. 'n Gradient-vrye benadering, darenteen, bied 'n goeie alternatief. Meer spesifiek verskaf die navorsingsveld van metaheuristieke kragtige optimeringstegnieke wat in die konteks van kunsmatige neurale netwerkleer toepaslik is. 'n Metaheuristiese optimeringsbenadering maak voorsiening vir veel groter vryheid tydens kunsmatige neurale netwerk-leer - die netwerkgewigte, die netwerkstruktuur en die aktiveringsfunksies van die netwerk kan gelyktydig só geoptimeer word. Hierdie veelsydigheid van metaheuristieke, sowel as hul bewese vermoë in verskeie optimeringskontekste, dien as motivering vir hul kern-oorweging in hierdie proefskrif. 'n Uitdaging vir alle optimeringsbenaderings het egter betrekking op die besluit oor watter metaheuristiek om vir hierdie doel in te span. Gelukkig bied die relatiewe nuwe en belowende studieveld van hiperheuristieke die nodige middele om hierdie uitdaging te oorkom - 'n hiperheuristiek is in wese 'n heuristiek wat heuristieke kies. Die hiperheuristiek wat in hierdie proefskrif oorweeg word, word die AMALGAM-metode genoem. AMALGAM is 'n kragtige en robuuste optimeringsbenadering wat beduidende prestasieverbeterings (met 'n faktor van tot tien) bied, terwyl die vlak van algemene toepaslikheid oor verskeie toetsprobleme verbeter. Hierdie hiperheuristiek is nog nie in die literatuur op die optimeringsprobleem van kunsmatige neurale netwerk-leer toegepas waarin netwerkgewigte, netwerkstruktuur en aktiveringsfunksies gelyktydig bepaal word nie. 'n AMALGAM-gebaseerde hiperheuristiese leeralgoritme word dus in hierdie proefskrif daargestel. Die oorspronklikheid van die probleem wat ondersoek word, vereis egter dat 'n nuwe wiskundige leermodel geformuleer word. Daarbenewens word nuwe veranderinge aan AMALGAM voorgestel sodat die algoritme vir neurale netwerk-leer ingespan kan word. 'n Tweedoelige hiperheuristiese leeralgoritme word ontwerp waarin die hoofdoel 'n netwerkprestasiemaatstaf verteenwoordig terwyl 'n sekondêre, sogenaamde hulpdoel daarop gemik is om die optimeringsoekproses te lei. 'n Versameling toetsprobleme, bestaande uit verskeie datastelle, word geskep om die doeltre endheid van die voorgestelde leeralgoritme te evalueer. Drie omvattende parameterevaluerings word uitgevoer om sodoende insig te verkry in algoritmiese prestasie onder verskillende omstandighede. Daar word ook 'n diepgaande algoritmiese prestasievergelyking uitgevoer waartydens die prestasie wat deur die voorgestelde hiperheuristiese leeralgoritme bereik word, vergelyk word met dié van sy deelalgoritmes. Die robuustheid van die voorgestelde benadering word ook deur middel van 'n meta-veralgemeningsanalise gevalideer. 'n Vergelyking tussen die hiperheuristiese leeralgoritme en kragtige gradiëntgebaseerde leeralgoritmes word verder uitgevoer en aangevul deur 'n ondersoek na die moontlike konsolidering van die hiperteuristiese benadering met die beste gradiëntgebaseerde algoritme. 'n In-diepte ondersoek na die temporale dinamika van die hiperheuristiek se deelalgoritmes word geloots om insig in hierdie nuwe benadering tot kunsmatige neurale netwerk-leer te verkry en om algoritmiese prestasie te voorspel. 'n Demonstrasie van hoe die werking van die hiperheuristiek deur middel van 'n voorspellingsmodel verbeter kan word, word ook gelewer. Die strukturele kenmerke wat verband hou met gunstige netwerke wat deur die hiperheuristiek gegenereer word, word geanaliseer met die oog op nuwe insig in die werking van die hiperheuristiek.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/109792
This item appears in the following collections: