Non-linear neurocontrol of chemical processes using reinforcement learning

Hunter, Stephen Leon (2011-12)

Thesis (MScEng)--Stellenbosch University, 2011.

Thesis

ENGLISH ABSTRACT: The difficulties of chemical process control using plain Proportional-Integral- Derivative (PID) methods include interaction of process manipulated- and control variables as well as difficulty in tuning. One way of eliminating these problems is to use a centralized non-linear control solution such as a feed-forward neural network. While many ways exist to train such neurocontrollers, one of the promising active research areas is reinforcement learning. The biggest drawing card of the neurocontrol using reinforcement learning paradigm is that no expert knowledge of the system is neccesary - all control knowledge is gained by interaction with the plant model. This work uses episodic reinforcement learning to train controllers using two types of process model - non-linear dynamic models and non-linear autoregressive models. The first was termed model-based training and the second data-based learning. By testing the controllers obtained during data-based learning on the original model, the effect of plant model mismatch and therefore real-world applicability could be seen. In addition, two reinforcement learning algorithms, Policy Gradients with Parameter-based Exploration (PGPE) and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) were compared to one-another. Set point tracking was facilitated by the use of integral error feedback. Two control case studies were conducted to test the effectiveness of each type of controller and algorithm, and allowed comparison to multi-loop feedback control. The first is a ball mill grinding circuit pilot plant model with 5 degrees of freedom, and the second a 41-stage binary distillation column with 7 degrees of freedom. The ball mill case study showed that centralized non-linear feedback control using neural networks can improve on even highly optimized PI control methods, with the proposed integral error-feedback neural network architecture working very well at tracking the set point. CMA-ES produced better results than PGPE, being able to find up to 20% better solutions. When compared to PI control, the ball mill neurocontrol solution had a 6% higher productivity and showed more than 10% improvement of the product size set point tracking. In the case of some plant-model mismatch (88% fit), the data-based ball mill neurocontroller still achieved better set point tracking and disturbance handling than PI control, but productivity did not improve. The distillation case study showed less positive results. While reinforcement learning was able to learn successful controllers in the case of no plant-model mismatch and outperform LV - and (L/D)(V/B)-based PI control, the best-performing neurocontroller still performed up to 20% worse than DB-based PI control. Once again, CMA-ES showed better performance than PGPE, with latter even failing to find feasible control solutions. While on-line learning in the ball mill study was made impossible due to stability issues, on-line adaptation in the distillation case study succeeded with the use of a partial neurocontroller. The learner was able to achieve, with a success rate of just over 50%, greater than 95% purity in both distillate and bottoms within 2,000 minutes of interacting with the plant. Overall, reinforcement learning showed that, when there is sufficient room for improvement over existing control implementations, it can make for a very good replacement control solution even when no model is available. Future work should focus on evaluating these techniques in lab-scale control studies.

AFRIKAANSE OPSOMMING: Die probleme van prosesbeheer met behulp van gewone Proporsioneel-Integraal- Afgeleide (PID) metodes sluit interaksie van proses gemanipuleerde- en beheerveranderlikes, sowel as probleme met in-stemming in. Een manier om hierdie probleme te elimineer, is deur ’n gesentraliseerde nie-lineêre oplossing te gebruik, soos ’n vorentoe-gevoerde neurale netwerk. Daar is baie maniere is om sulke neurobeheerders op te lei, waarvan die meer innoverende maniere versterkingsleer is. Die grootste trekpleister van versterkingsleer is dat geen deskundige kennis van die stelsel nodig is nie - alle beheerkennis word opgedoen deur interaksie met die aanleg model. Hierdie werk gebruik episodiese versterkingsleer om beheerders met behulp van twee tipes van prosesmodel op te lei - nie-lineêre dinamiese modelle en nie-lineêre outoregressiewe modelle. Die eerste was model-gebaseerde opleiding en die tweede data-gebaseerde opleiding genoem. Deur die beheerders wat verkry is tydens datagebaseerde opleiding op die oorspronklike model te toets, kon die effek van die verskil tussen aanleg en model gesien word, en ’n aanduiding van werklike wêreld toepaslikheid gee. Twee versterkingsleer algoritmes was met mekaar vergelyk - Policy Gradients with Parameter-based Exploration (PGPE), en die Covariance Matrix Adaptation Evolution Strategy. Stelpunt volging was deur integraalfout-terugvoer gefasiliteer. Twee gevallestudies is uitgevoer om die doeltreffendheid van elke tipe beheerder en algoritme te toets, deur vergelyking met PI terugvoerbeheer. Die eerste is ’n balmeul toetsaanleg met ’n vryheidsgraad van 5 en die tweede ’n binêre distillasie kolom met ’n vryheidsgraad van 7. Die balmeul gevallestudie het getoon dat gesentraliseerde nie-lineêre terugvoerbeheer met behulp van neurale netwerke selfs op hoogs-geoptimeerde PI beheer metodes kan verbeter. In vergelyking met PI beheer, kon die balmeul neurobeheer oplossing ’n 6% hoër produktiwiteit handhaaf en het meer as 10% verbetering in die handhawing van die produkgrootte stel punt getoon. In die geval van ’n 12% aanleg-model verskil, het die data-gebaseerde balmeul neurobeheerder steeds beter stel punt handhawing en versteuring hantering as PI beheer gewys, alhoewel produktiwiteit nie verbeter het nie. In beide gevalle het die integraalfout oplossing sukses getoon, en CMA-ES het tot 20% beter as PGPE gevaar. Die distillasie gevallestudie het getoon dat die sukses van die balmeul gevallestudie nie noodwendig na ander aanlegte uitbrei nie. Alhoewel versterkingsleer in staat was om suksesvolle beheerders in die geval van geen aanleg-model verskil te leer, het die beste presterende neurobeheerder steeds tot 20% swakker as DB-gebaseerde PI beheer gevaar. Weereens het CMA-ES beter as PGPE gevaar, met die laasgenoemde wat selfs nie daarin kon slaag om werkende oplossings te vind nie. Alhoewel onstabiliteit aan-lyn aanpassing in die balmeul gevallestudie onmoontlik gemaak het, is an-lyn aanpassing in die distillasie gevallestudie moontlik gemaak deur die gebruik van ’n gedeeltelike neurobeheerder. Die leerder was in staat om, met ’n slaagsyfer van net meer as 50 %, meer as 95 % suiwerheid in beide uitlaatstrome te bereik in 2,000 minute van die interaksie met die aanleg. Op die ou end het versterkingsleer getoon dat, wanneer daar voldoende ruimte is vir verbetering oor bestaande beheer implementasies, kan dit ’n baie goeie vervanging wees selfs wanneer daar geen model beskikbaar is nie. Toekomstige werk moet fokus op laboratoriumskaal toepassings van hierdie tegnieke.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/17871
This item appears in the following collections: