Combinatorial evolution of feedforward neural network models for chemical processes

Date
1999-11
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Neural networks, in particular feedforward neural networks architectures such as multilayer perceptrons and radial basis function networks, have been used successfully in many chemical engineering applications. A number of techniques exist with which such neural networks can be trained. These include backpropagation, k-means clustering and evolutionary algorithms. The latter method is particularly useful, as it is able to avoid local optima in the search space and can optimise parameters for which there exists no gradient information. Unfortunately only moderately-sized networks can be trained by this method, owing to the fact that evolutionary optimisation is extremely computationally intensive. In this paper, a novel algorithm called combinatorial evolution of regression nodes (CERN) is proposed for training non-linear regression models, such as neural networks. This evolutionary algorithm uses a branch-and-bound combinatorial search in the selection scheme to optimise groups of neural nodes. The use of a combinatorial search, for a set of basis nodes, in the optimisation of neural networks is a concept introduced for the first time in this thesis. Thereby it automatically solves the problem of permutational redundancy associated with the training of the hidden layer of a neural network. CERN was· further enhanced by using clustering, which actively supports niches in the population. This also enabled the optimisation of the node types to be used in the hidden layers, which need not necessarily be the same for each of the nodes. (i.e. a mixed layer of different node types can be found.) A restriction that does apply is that in order to make the combinatorial search efficient enough, the output layer of the neural network needs to be linear. CERN was found to be significantly more efficient than a conventional evolutionary algorithm not using a combinatorial search. It also trained faster than backpropagation with momentum and an adaptive learning rate. Although the Levenberg-Marquardt algorithm is nevertheless significantly faster than CERN, it struggled to train in the presence of many non-local minima. Furthermore, the Levenberg-Marquardt learning rule tends to overtrain, (see below) and requires a gradient information. CERN was analysed on seven real world and six synthetic data sets. Oriented ellipsoidal basis nodes optimised with CERN achieved significantly better accuracy with fewer nodes than spherical basis nodes optimised by means of k-means clustering. On the test data multilayer perceptrons optimised by CERN were found to be more accurate than those trained by the gradient descent techniques, backpropagation with momentum and the Levenberg-Marquardt update rule. The networks of CERN were also compared to the splines of MARS and were found to generalise significantly better or as well as MARS. However, for some data sets, MARS was used to select the input variables to use for the neural network models. Networks of ellipsoidal basis functions built by CERN were more compact and more accurate than radial basis function networks trained using k-means clustering. Moreover, the ellipsoidal nodes can be translated into fuzzy systems. The generalisation and complexity of the resulting fuzzy rules were comparable to fuzzy systems optimised by ANFIS, but did not result in an exponential increase of the number of rules. This was caused by the grid-partitioning employed by ANFIS and for data sets with a relatively high dimensionality, in comparison with the data points, the resulting generalisation was consequently much poorer than that of the CERN models. In summary, the proposed combinatorial selection scheme was able to make an existing evolutionary algorithm significantly faster for neural network optimisation. This made it computationally competitive with traditional gradient descent based techniques. Being an evolutionary algorithm, the proposed technique does not require a gradient and can therefore optimise a larger set of parameters in comparison to traditional techniques.
AFRIKAANSE OPSOMMING: Neural netwerke, veral die met voorwaartsvoerende argitekture soos multilaag-perseptrons en radiaalbasisfunksie-netwerke, is al suksesvol in verskeie chemiese ingenieurstoepassings aangewend. Daar bestaan 'n aantal tegnieke waarmee sulke netwerke ontwikkel kan word. Hierdie sluit truplanting, k-gemiddelde groepering en evolusionere algoritmes in. Die laaste metode is besonder handig omdat dit in staat is om lokale minima in die soekruimte te vermy en dit kan parameters optimeer waarvoor daar geen hellinginligting bestaan nie. Ongelukkig kan slegs netwerke van matige groottes hiermee opgelei word, aangesien evolusionere optimering uitermate berekeningsintensief is. In hierdie tesis word 'n nuwe algoritme, genaamd samevoeging evolusie van regressienodes (CERN), voorgestel om nie-lineere regressiemodelle, soos neurale netwerke, op te lei. Hierdie evolusionere algoritme gebruik 'n "vertakking-en-begrensing" samevoegende soektog in die uitkiesskema om groeperings van neurale nodes te optimeer. Die gebruik van 'n samevoegende soektogvir 'n versameling van basisnodes in die optimering van neurale netwerke is 'n konsep wat die eerste maal in hierdie tesis voorgestel word. Daardeur los dit die probleem van permuterende oorbodigheid, verbind met die oplei van die versteekte laag in 'n neurale netwerk, op. CERN is verder verbeter deur die gebruik van groepering, wat nisse in die populasie aktief ondersteun. Dit het die optimering van die nodetipes in die versteekte laag, wat nie noodwendig almal dieselfde hoef te wees nie, moontlik gemaak (d. w.s. 'n gemengde laag van verskillende nodetipes kan gevind word). 'n Heersende beperking is dat, ten einde die samevoegende soektog doeltreffend genoeg te maak, die uitsetlaag lineer moet wees. Eksperimente het aangetoon dat CERN beduidend meer doeltreffend was as 'n konvensionele evolusionere algoritme, wat nie 'n samevoegende soektog gebruik is nie. Dit het ook vinniger opgelei as truplanting met momentum en 'n aanpassende leertempo. Alhoewel die LevenbergMarquardt- algoritme steeds vinniger is as CERN, het dit gesukkel om op te lei in die teenwoordigheid van verskeie nie-Iokale minima. Verder was die Levenberg-Marquardtalgoritme geneig om oor te pas (sien hieronder) en benodig dit hellinginligting. CERN is ondersoek by wyse van sewe werklike en ses kunsmatige datastelle. Gerigte ellipsoldalebasisnodes wat met CERN geoptimeer is, het beduidend beter akkuraatheid met minder nodes bereik as sferiese nodes wat geoptimeer is met k-gemiddelde groepering. Multilaag-perseptrons wat geoptimeer is met CERN, was meer akkuraat te wees as diesulkes wat met aflopende hellingtegnieke, tmplanting met momentum. en die Levenberg-Marquardt opdateringsreel opgelei is. Die netwerke van CERN is ook vergelyk met die latfunksies van MARS (Eng: multi-adaptive regression splines) en het beduidend beter veralgemeen as MARS. Vir sommige datastelle was MARS egter gebruik in die keuse van insetveranderlikes vir die neurale netwerke. Netwerke bestaande uit ellipsoYdale basisfunksies wat met CERN gebou is, was meer kompak en akkuraat as radiaalbasisfunksienetwerke, wat opgelei is met k-gemiddelde groepering. Daarby kan die ellipsoYdale nodes vertaal word na wasige stelsels (Eng: fuzzy systems). Die veralgemening en kompleksiteit van die ooreenstemmende wasige reels was vergelykbaar met wasige stelsels wat geoptimeer is met ANFIS, maar het nie op In eksponensiele toename in die aantal nodes uitgeloop nie. Dit is veroorsaak deur die roosterindeling wat deur ANFIS gebruik word en vir datastelle met 'n relatief hoe dimensionaliteit in verhouding tot die datapunte was die gevolglike veralgemening dus baie slegter as vir die CERN modelle. Ter opsomming was die voorgestelde samevoegende uitkiesskema in staat om In bestaande evolusionere optimeringsalgoritme van 'n neurale netwerk beduidend vinniger te maak. Dit het die uitkiesskema mededingend met tradisionele aflopende hellingtegnieke gemaak. Synde 'n evolusionere algoritme, het die voorstelde tegniek nie 'n helling nodig nie en kan daarom 'n groter versameling van parameters optimeer in vergelyking met tradisionele tegnieke.
Description
Dissertation Ph.D(Ing) -- University of Stellenbosch, 1999.
Keywords
Chemical processes -- Data processing, Neural networks (Computer science), Chemical process control, Chemical engineering -- Data processing, Dissertations -- Chemical engineering
Citation