Process monitoring and fault diagnosis using random forests

Auret, Lidia (2010-12)

Thesis (PhD (Process Engineering))--University of Stellenbosch, 2010.

Dissertation presented for the Degree of DOCTOR OF PHILOSOPHY (Extractive Metallurgical Engineering) in the Department of Process Engineering at the University of Stellenbosch

Thesis

ENGLISH ABSTRACT: Fault diagnosis is an important component of process monitoring, relevant in the greater context of developing safer, cleaner and more cost efficient processes. Data-driven unsupervised (or feature extractive) approaches to fault diagnosis exploit the many measurements available on modern plants. Certain current unsupervised approaches are hampered by their linearity assumptions, motivating the investigation of nonlinear methods. The diversity of data structures also motivates the investigation of novel feature extraction methodologies in process monitoring. Random forests are recently proposed statistical inference tools, deriving their predictive accuracy from the nonlinear nature of their constituent decision tree members and the power of ensembles. Random forest committees provide more than just predictions; model information on data proximities can be exploited to provide random forest features. Variable importance measures show which variables are closely associated with a chosen response variable, while partial dependencies indicate the relation of important variables to said response variable. The purpose of this study was therefore to investigate the feasibility of a new unsupervised method based on random forests as a potentially viable contender in the process monitoring statistical tool family. The hypothesis investigated was that unsupervised process monitoring and fault diagnosis can be improved by using features extracted from data with random forests, with further interpretation of fault conditions aided by random forest tools. The experimental results presented in this work support this hypothesis. An initial study was performed to assess the quality of random forest features. Random forest features were shown to be generally difficult to interpret in terms of geometry present in the original variable space. Random forest mapping and demapping models were shown to be very accurate on training data, and to extrapolate weakly to unseen data that do not fall within regions populated by training data. Random forest feature extraction was applied to unsupervised fault diagnosis for process data, and compared to linear and nonlinear methods. Random forest results were comparable to existing techniques, with the majority of random forest detections due to variable reconstruction errors. Further investigation revealed that the residual detection success of random forests originates from the constrained responses and poor generalization artifacts of decision trees. Random forest variable importance measures and partial dependencies were incorporated in a visualization tool to allow for the interpretation of fault conditions. A dynamic change point detection application with random forests proved more successful than an existing principal component analysis-based approach, with the success of the random forest method again residing in reconstruction errors. The addition of random forest fault diagnosis and change point detection algorithms to a suite of abnormal event detection techniques is recommended. The distance-to-model diagnostic based on random forest mapping and demapping proved successful in this work, and the theoretical understanding gained supports the application of this method to further data sets.

AFRIKAANSE OPSOMMING: Foutdiagnose is ’n belangrike komponent van prosesmonitering, en is relevant binne die groter konteks van die ontwikkeling van veiliger, skoner en meer koste-effektiewe prosesse. Data-gedrewe toesigvrye of kenmerkekstraksie-benaderings tot foutdiagnose benut die vele metings wat op moderne prosesaanlegte beskikbaar is. Party van die huidige toesigvrye benaderings word deur aannames rakende liniariteit belemmer, wat as motivering dien om nie-liniêre metodes te ondersoek. Die diversiteit van datastrukture is ook verdere motivering vir ondersoek na nuwe kenmerkekstraksiemetodes in prosesmonitering. Lukrake-woude is ’n nuwe statistiese inferensie-tegniek, waarvan die akkuraatheid toegeskryf kan word aan die nie-liniêre aard van besluitnemingsboomlede en die bekwaamheid van ensembles. Lukrake-woudkomitees verskaf meer as net voorspellings; modelinligting oor datapuntnabyheid kan benut word om lukrakewoudkenmerke te verskaf. Metingbelangrikheidsaanduiers wys watter metings in ’n noue verhouding met ’n gekose uitsetveranderlike verkeer, terwyl parsiële afhanklikhede aandui wat die verhouding van ’n belangrike meting tot die gekose uitsetveranderlike is. Die doel van hierdie studie was dus om die uitvoerbaarheid van ’n nuwe toesigvrye metode vir prosesmonitering gebaseer op lukrake-woude te ondersoek. Die ondersoekte hipotese lui: toesigvrye prosesmonitering en foutdiagnose kan verbeter word deur kenmerke te gebruik wat met lukrake-woude geëkstraheer is, waar die verdere interpretasie van foutkondisies deur addisionele lukrake-woude-tegnieke bygestaan word. Eksperimentele resultate wat in hierdie werkstuk voorgelê is, ondersteun hierdie hipotese. ’n Intreestudie is gedoen om die gehalte van lukrake-woudkenmerke te assesseer. Daar is bevind dat dit moeilik is om lukrake-woudkenmerke in terme van die geometrie van die oorspronklike metingspasie te interpreteer. Verder is daar bevind dat lukrake-woudkartering en -dekartering baie akkuraat is vir opleidingsdata, maar dat dit swak ekstrapolasie-eienskappe toon vir ongesiene data wat in gebiede buite dié van die opleidingsdata val. Lukrake-woudkenmerkekstraksie is in toesigvrye-foutdiagnose vir gestadigde-toestandprosesse toegepas, en is met liniêre en nie-liniêre metodes vergelyk. Resultate met lukrake-woude is vergelykbaar met dié van bestaande metodes, en die meerderheid lukrake-woudopsporings is aan metingrekonstruksiefoute toe te skryf. Verdere ondersoek het getoon dat die sukses van res-opsporing op die beperkte uitsetwaardes en swak veralgemenende eienskappe van besluitnemingsbome berus. Lukrake-woude-metingbelangrikheidsaanduiers en parsiële afhanklikhede is ingelyf in ’n visualiseringstegniek wat vir die interpretasie van foutkondisies voorsiening maak. ’n Dinamiese aanwending van veranderingspuntopsporing met lukrake-woude is as meer suksesvol bewys as ’n bestaande metode gebaseer op hoofkomponentanalise. Die sukses van die lukrake-woudmetode is weereens aan rekonstruksie-reswaardes toe te skryf. ’n Voorstel wat na aanleiding van hierde studie gemaak is, is dat die lukrake-woudveranderingspunt- en foutopsporingsmetodes by ’n soortgelyke stel metodes gevoeg kan word. Daar is in hierdie werk bevind dat die afstand-vanaf-modeldiagnostiek gebaseer op lukrake-woudkartering en -dekartering suksesvol is vir foutopsporing. Die teoretiese begrippe wat ontsluier is, ondersteun die toepassing van hierdie metodes op verdere datastelle.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/5360
This item appears in the following collections: