Browsing by Author "Coomans, Cornelius Johannes"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemEvaluation of statistical analyses for the identification of surrogates and indicators using historical plant data from a water reclamation plant(Stellenbosch : Stellenbosch University, 2017-03) Coomans, Cornelius Johannes; Auret, Lidia; Burger, A. J.; Swartz, C. D.; Stellenbosch University. Faculty of Engineering. Dept. of Process Engineering.ENGLISH SUMMARY: The lag time associated with water quality monitoring at water reclamation plants (WRPs) is a major hurdle in the way of implementing potable water reclamation in areas suffering from water shortages. The application of advanced monitoring techniques, which rely in part on surrogate and indicator variables, are one way of reducing the lag time associated with water quality monitoring. The aim of this study was to evaluate statistical analyses that could be used to identify variable relationships, which in turn could be used for the development of surrogate and indicator variables, following the data-driven approach. The plant data used in this study were obtained from an existing WRP that has been operational for more than five years without undergoing any major changes to the treatment and operational procedures. An initial assessment of the data found that the data contained large amounts of missing values. The assessment also identified the data periods during which the plant was operating under ‘normal’ conditions. Several time periods were removed since abnormal events occurred during these time periods. Pre-processing the data consisted of outlier removal (three sigma rule and Hampel filter), noise reduction (moving average filter) and missing data replacement (linear interpolation). The statistical analyses, Pearson’s and Spearman’s correlation, principal component analysis (PCA), linear discriminant analysis (LDA) and partial least squares (PLS) regression, were then incorporated into models for identifying variable relationships. The performance of the different statistical analyses were measured using statistical metrics such as R2 for correlation, visualisation of separation for PCA, classification error for LDA and both R2 and mean squared error (MSE) for the PLS models. The bivariate correlations provided the most concise results, whilst the LDA models could not be effectively assessed due to a change in the behaviour of the training and testing data. The PLS models performed poorly and did not produce any significant results. Expert process knowledge was also used to determine which variable relationships, identified by the models, could be regarded as valuable contributions, and which ought to be regarded as trivial. Overall it was found that the bivariate correlations were effective for detecting relationships between variables. PCA was a valuable tool that provided insight into the potential use of multivariate analyses. LDA and PLS regression may require further testing before a definitive ruling can be made regarding their usefulness for identifying variable relationships from unprocessed historical plant data. Although historical data could be used to identify variable relationships using bivariate correlations, it is not recommended for multivariate statistical analyses. A planned sampling campaign could be much more effective for data collection than using historical data, although the cost associated with a planned sampling campaign must be taken into consideration.