Monitoring and diagnosis of process systems using kernel-based learning methods

Jemwa, Gorden Takawadiyi (2007-12)

Thesis (PhD (Process Engineering))--University of Stellenbosch, 2007.

Dissertation presented for the degree of Doctor of Philosophy in Engineering at the University of Stellenbosch.

Thesis

ENGLISH ABSTRACT: The development of advanced methods of process monitoring, diagnosis, and control has been identified as a major 21st century challenge in control systems research and application. This is particularly the case for chemical and metallurgical operations owing to the lack of expressive fundamental models as well as the nonlinear nature of most process systems, which makes established linearization methods unsuitable. As a result, efforts have been directed in the search of alternative approaches that do not require fundamental or analytical models. Data-based methods provide a very promising alternative in this regard, given the huge volumes of data being collected in modern process operations as well as advances in both theoretical and practical aspects of extracting information from observations. In this thesis, the use of kernel-based learning methods in fault detection and diagnosis of complex processes is considered. Kernel-based machine learning methods are a robust family of algorithms founded on insights from statistical learning theory. Instead of estimating a decision function on the basis of minimizing the training error as other learning algorithms, kernel methods use a criterion called large margin maximization to estimate a linear learning rule on data embedded in a suitable feature space. The embedding is implicitly defined by the choice of a kernel function and corresponds to inducing a nonlinear learning rule in the original measurement space. Large margin maximization corresponds to developing an algorithm with theoretical guarantees on how well it will perform on unseen data. In the first contribution, the characterization of time series data from process plants is investigated. Whereas complex processes are difficult to model from first principles, they can be identified using historic process time series data and a suitable model structure. However, prior to fitting such a model, it is important to establish whether the time series data justify the selected model structure. Singular spectrum analysis (SSA) has been used for time series identification. A nonlinear extension of SSA is proposed for classification of time series. Using benchmark systems, the proposed extension is shown to perform better than linear SSA. Moreover, the method is shown to be useful for filtering noise in time series data and, therefore, has potential applications in other tasks such as data rectification and gross error detection. Multivariate statistical process monitoring methods are well-established techniques for efficient information extraction from multivariate data. Such information is usually compact and amenable to graphical representation in two or three dimensional plots. For process monitoring purposes control limits are also plotted on these charts. These control limits are usually based on a hypothesized analytical distribution, typically the Gaussian normal distribution. A robust approach for estimating con dence bounds using the reference data is proposed. The method is based on one-class classification methods. The usefulness of using data to define a confidence bound in reducing fault detection errors is illustrated using plant data. The use of both linear and nonlinear supervised feature extraction is also investigated. The advantages of supervised feature extraction using kernel methods are highlighted via illustrative case studies. A general strategy for fault detection and diagnosis is proposed that integrates feature extraction methods, fault identification, and different methods to estimate confidence bounds. For kernel-based approaches, the general framework allows for interpretation of the results in the input space instead of the feature space. An important step in process monitoring is identifying a variable responsible for a fault. Although all faults that can occur at any plant cannot be known beforehand, it is possible to use knowledge of previous faults or simulations to anticipate their recurrence. A framework for fault diagnosis using one-class support vector machine (SVM) classification is proposed. Compared to other previously studied techniques, the one-class SVM approach is shown to have generally better robustness and performance characteristics. Most methods for process monitoring make little use of data collected under normal operating conditions, whereas most quality issues in process plants are known to occur when the process is in-control . In the final contribution, a methodology for continuous optimization of process performance is proposed that combines support vector learning with decision trees. The methodology is based on continuous search for quality improvements by challenging the normal operating condition regions established via statistical control. Simulated and plant data are used to illustrate the approach.

AFRIKAANSE OPSOMMING: Die ontwikkeling van gevorderde metodes van prosesmonitering, diagnose en -beheer is geïdentifiseer as 'n groot 21ste eeuse uitdaging in die navorsing en toepassing van beheerstelsels. Dit is veral die geval in die chemiese en metallurgiese bedryf, a.g.v. die gebrek aan fundamentele modelle, sowel as die nielineêre aard van meeste prosesstelsels, wat gevestigde benaderings tot linearisasie ongeskik maak. Die gevolg is dat pogings aangewend word om te soek na alternatiewe benaderings wat nie fundamentele of analitiese modelle benodig nie. Data-gebaseerde metodes voorsien belowende alternatiewe in dié verband, gegewe die enorme volumes data wat in moderne prosesaanlegte geberg word, sowel as die vooruitgang wat gemaak word in beide die teoretiese en praktiese aspekte van die onttrekking van inligting uit waarnemings. In die tesis word die gebruik van kern-gebaseerde metodes vir foutopsporing en -diagnose van komplekse prosesse beskou. Kern-gebaseerde masjienleermetodes is 'n robuuste familie van metodes gefundeer op insigte uit statistiese leerteorie. Instede daarvan om 'n besluitnemingsfunksie te beraam deur passingsfoute op verwysingsdata te minimeer, soos wat gedoen word met ander leermetodes, gebruik kern-metodes 'n kriterium genaamd groot marge maksimering om lineêre reëls te pas op data wat ingebed is in 'n geskikte kenmerkruimte. Die inbedding word implisiet gedefinieer deur die keuse van die kern-funksie en stem ooreen met die indusering van 'n nielineêre reël in die oorspronklike meetruimte. Groot marge-maksimering stem ooreen met die ontwikkeling van algoritmes waarvan die prestasie t.o.v. die passing van nuwe data teoreties gewaarborg is. In die eerste bydrae word die karakterisering van tydreeksdata van prosesaanlegte ondersoek. Alhoewel komplekse prosesse moeilik is om vanaf eerste beginsels te modelleer, kan hulle geïdentifiseer word uit historiese tydreeksdata en geskikte modelstrukture. Voor so 'n model gepas word, is dit belangrik om vas te stel of die tydreeksdata wel die geselekteerde modelstruktuur ondersteun. 'n Nielineêre uitbreiding van singuliere spektrale analise (SSA) is voorgestel vir die klassifikasie van tydreekse. Deur gebruik te maak van geykte stelsels, is aangetoon dat die voorgestelde uitbreiding beter presteer as lineêre SSA. Tewens, daar word ook aangetoon dat die metode nuttig is vir die verwydering van geraas in tydreeksdata en daarom ook potensiële toepassings het in ander take, soos datarektifikasie en die opsporing van sistematiese foute in data. Meerveranderlike statistiese prosesmonitering is goed gevestig vir die doeltreffende onttrekking van inligting uit meerveranderlike data. Sulke inligting is gewoonlik kompak en geskik vir voorstelling in twee- of drie-dimensionele grafieke. Vir die doeleindes van prosesmonitering word beheerlimiete dikwels op sulke grafieke aangestip. Hierdie beheerlimiete word gewoonlik gebaseer op 'n hipotetiese analitiese verspreiding van die data, tipiese gebaseer op 'n Gaussiaanse model. 'n Robuuste benadering vir die beraming van betroubaarheidslimiete gebaseer op verwysingsdata, word in die tesis voorgestel. Die metode is gebaseer op eenklas-klassifikasie en die nut daarvan deur data te gebruik om die betroubaarheidsgrense te beraam ten einde foutopsporing te optimeer, word geïllustreer aan die hand van aanlegdata. Die gebruik van beide lineêre en nielineêre oorsiggedrewe kenmerkonttrekking is vervolgens ondersoek. Die voordele van oorsiggedrewe kenmerkonttrekking deur van kern-metodes gebruik te maak is beklemtoon deur middel van illustratiewe gevallestudies. 'n Algemene strategie vir foutopsporing en -diagnose word voorgestel, wat kenmerkonttrekkingsmetodes, foutidenti kasie en verskillende metodes om betroubaarheidsgrense te beraam saamsnoer. Vir kern-gebaseerde metodes laat die algemene raamwerk toe dat die resultate in die invoerruimte vertolk kan word, in plaas van in die kenmerkruimte. 'n Belangrike stap in prosesmonitering is om veranderlikes te identifiseer wat verantwoordelik is vir foute. Alhoewel alle foute wat by 'n chemiese aanleg kan plaasvind, nie vooraf bekend kan wees nie, is dit moontlik om kennis van vorige foute of simulasies te gebruik om die herhaalde voorkoms van die foute te antisipeer. 'n Raamwerk vir foutdiagnose wat van eenklas-steunvektormasjiene (SVM) gebruik maak is voorgestel. Vergeleke met ander tegnieke wat voorheen bestudeer is, is aangetoon dat die eenklas-SVM benadering oor die algemeen beter robuustheid en prestasiekenmerke het. Meeste metodes vir prosesmonitering maak min gebruik van data wat opgeneem is onder normale bedryfstoestande, alhoewel meeste kwaliteitsprobleme ondervind word waneer die proses onder beheer is. In die laaste bydrae, is 'n metodologie vir die kontinue optimering van prosesprestasie voorgestel, wat steunvektormasjiene en beslissingsbome kombineer. Die metodologie is gebaseer op die kontinue soeke na kwaliteitsverbeteringe deur die normale bedryfstoestandsgrense, soos bepaal deur statistiese beheer, te toets. Gesimuleerde en werklike aanlegdata is gebruik om die benadering te illustreer.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/4483
This item appears in the following collections: