Interpretation of nonlinear relationships between process variables by use of random forests
Better understanding of process phenomena is dependent on the interpretation of models capturing the relationships between the process variables. Although linear regression is used routinely in the mineral process industries for this purpose, it may not be useful where the relationships between variables are nonlinear or complex. Under these circumstances, nonlinear methods, such as neural networks or decision trees can be used to develop reliable models, without necessarily giving any particular or explicit insight into the relationships between the process and the target variables. This is a major drawback in situations where such information would be very important, such as in fault identification or gaining a better understanding of the fundamentals of a process. In this paper, the use of variable importance measures and partial dependency plots generated by random forest models are proposed as a practical tool that can be used to surmount this problem. In particular, it is shown that important variables can be flagged by appropriate threshold generated by inclusion of dummy variables in the system. Moreover, the results of the study indicate that random forest models can reliably identify the influence of individual variables, even in the presence of high levels of additive noise. This would make it a useful tool in continuous process improvement and root cause analysis of abnormal process behaviour. © 2012 Elsevier Ltd. All rights reserved.