Neurocontroller development for nonlinear processes utilising evolutionary reinforcement learning

Conradie, Alex van Eck (2000-04)

Thesis (MEng)--University of Stellenbosch, 2000.

Thesis

ENGLISH ABSTRACT: The growth in intelligent control has primarily been a reaction to the realisation that nonlinear control theory has been unable to provide practical solutions to present day control challenges. Consequently the chemical industry may be cited for numerous instances of overdesign, which result as an attempt to avoiding operation near or within complex (often more economically viable) operating regimes. Within these complex operating regimes robust control system performance may prove difficult to achieve using conventional (algorithmic) control methodologies. Biological neuronal control mechanisms demonstrate a remarkable ability to make accurate generalisations from sparse environmental information. Neural networks, with their ability to learn and their inherent massive parallel processing ability, introduce numerous opportunities for developing superior control structures for complex nonlinear systems. To facilitate neural network learning, reinforcement learning techniques provide a framework which allows for learning from direct interactions with a dynamic environment. lts promise as a means of automating the knowledge acquisition process is beguiling, as it provides a means of developing control strategies from cause and effect (reward and punishment) interaction information, without needing to specify how the goal is to be achieved. This study aims to establish evolutionary reinforcement learning as a powerful tool for developing robust neurocontrollers for application in highly nonlinear process systems. A novel evolutionary algorithm; Symbiotic, Adaptive Neuro-Evolution (SANE), is utilised to facilitate neurocontroller development. This study also aims to introduce SANE as a means of integrating the process design and process control development functions, to obtain a single comprehensive calculation step for maximum economic benefit. This approach thus provides a tool with which to limit the occurrence of overdesign in the process industry. To investigate the feasibility of evolutionary reinforcement learning in achieving these aims, the SANE algorithm is implemented in an event-driven software environment (developed in Delphi 4.0), which may be applied for both simulation and real world control problems. Four highly nonlinear reactor arrangements are considered in simulation studies. As a real world application, a novel batch distillation pilot plant, a Multi-Effect Batch Distillation (MEBAD) column, was constructed and commissioned. The neurocontrollers developed using SANE in the complex simulation studies, were found to exhibit excellent robustness and generalisation capabilities. In comparison with model predictive control implementations, the neurocontrollers proved far less sensitive to model parameter uncertainties, removing the need for model mismatch compensation to eliminate steady state off-set. The SANE algorithm also proved highly effective in discovering the operating region of greatest economic return, while simultaneously developing a neurocontroller for this optimal operating point. SANE, however, demonstrated limited success in learning an effective control policy for the MEBAD pilot plant (poor generalisation), possibly due to limiting the algorithm's search to a too small region of the state space and the disruptive effects of sensor noise on the evaluation process. For industrial applications, starting the evolutionary process from a random initial genetic algorithm population may prove too costly in terms of time and financial considerations. Pretraining the genetic algorithm population on approximate simulation models of the real process, may result in an acceptable search duration for the optimal control policy. The application of this neurocontrol development approach from a plantwide perspective should also have significant benefits, as individual controller interactions are so doing implicitly eliminated.

AFRIKAANSE OPSOMMING: The huidige groei in intelligente beheerstelsels is primêr 'n reaksie op die besef dat nie-liniêre beheerstelsel teorie nie instaat is daartoe om praktiese oplossings te bied vir huidige beheer kwelkwessies nie. Gevolglik kan talle insidente van oorontwerp in die chemiese nywerhede aangevoer word, wat voortvloei uit 'n poging om bedryf in of naby komplekse bedryfsgebiede (dikwels meer ekonomies vatbaar) te vermy. Die ontwikkeling van robuuste beheerstelsels, met konvensionele (algoritmiese ) beheertegnieke, in die komplekse bedryfsgebiede mag problematies wees. Biologiese neurobeheer megamsmes vertoon 'n merkwaardige vermoë om te veralgemeen vanaf yl omgewingsdata. Neurale netwerke, met hulle vermoë om te leer en hulle inherente paralleie verwerkingsvermoë, bied talle geleenthede vir die ontwikkeling van meer doeltreffende beheerstelsels vir gebruik in komplekse nieliniêre sisteme. Versterkingsleer bied a raamwerk waarbinne 'n neurale netwerk leer deur direkte interaksie met 'n dinamiese omgewing. Versterkingsleer hou belofte in vir die inwin van kennis, deur die ontwikkeling van beheerstrategieë vanaf aksie en reaksie (loon en straf) interaksies - sonder om te spesifiseer hoe die taak voltooi moet word. Hierdie studie beaam om evolutionêre versterkingsleer as 'n kragtige strategie vir die ontwikkeling van robuuste neurobeheerders in nie-liniêre prosesomgewings, te vestig. 'n Nuwe evolutionêre algoritme; Simbiotiese, Aanpasbare, Neuro-Evolusie (SANE), word aangewend vir die onwikkeling van die neurobeheerders. Hierdie studie beoog ook die daarstelling van SANE as 'n weg om prosesontwerp en prosesbeheer ontwikkeling vir maksimale ekonomiese uitkering, te integreer. Hierdie benadering bied dus 'n strategie waardeur die insidente van oorontwerp beperk kan word. Om die haalbaarheid van hierdie doelwitte, deur die gebruik van evolusionêre versterkingsleer te ondersoek, is die SANE algoritme aangewend in 'n Windows omgewing (ontwikkel in Delphi 4.0). Die Delphi programmatuur geniet toepassing in beide die simulasie en werklike beheer probleme. Vier nie-liniêre reaktore ontwerpe is oorweeg in die simulasie studies. As 'n werklike beheer toepassing, is 'n nuwe enkelladingsdistillasie kolom, 'n Multi-Effek Enkelladingskolom (MEBAD) gebou en in bedryf gestel. Die neurobeheerders vir die komplekse simulasie studies, wat deur SANE ontwikkel is, het uitstekende robuustheid en veralgemeningsvermoë ten toon gestel. In vergelyking met model voorspellingsbeheer implementasies, is gevind dat die neurobeheerders heelwat minder sensitief is vir model parameter onsekerheid. Die noodsaak na modelonsekerheid kompensasie om gestadigde toestand afset te elimineer, word gevolglik verwyder. The SANE algoritme is ook hoogs effektief vir die soek na die mees ekonomies bedryfstoestand, terwyl 'n effektiewe neurobeheerder gelyktydig vir hierdie ekonomies optimumgebied ontwikkel word. SANE het egter beperkte sukses in die leer van 'n effektiewe beheerstrategie vanaf die MEBAD toetsaanleg getoon (swak veralgemening). Die swak veralgemening kan toegeskryf word aan 'n te klein bedryfsgebied waarin die algoritme moes soek en die negatiewe effek van sensor geraas op die evaluasie proses. Vir industriële applikasies blyk dit dat die uitvoer van die evolutionêre proses vanaf 'n wisselkeurige begintoestand nie koste effektief is in terme van tyd en finansies nie. Deur die genetiese algoritme populasie vooraf op 'n benaderde modelop te lei, kan die soek tydperk na 'n optimale beheerstrategie aansienlik verkort word. Die aanwending van die neurobeheer ontwikkelingstrategie vanuit 'n aanlegwye oogpunt mag aanleiding gee tot aansienlike voordele, aaangesien individuele beheerder interaksies sodoende implisiet uitgeskakel word.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/51841
This item appears in the following collections: