Browsing by Author "Malan, Francina"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemExtracting failure modes from unstructured, natural language text(Stellenbosch : Stellenbosch University, 2020-03) Malan, Francina; Jooste, J. L.; Stellenbosch University. Faculty of Industrial Engineering. Dept. of Industrial Engineering.ENGLISH ABSTRACT: This thesis investigates whether text mining (and the related fields of machine learning and natural language processing) can be used to extract useful information, specifically failure modes, from the low quality, unstructured text records available in industry. Failure data, and particularly information about failure modes, is imperative for good asset management, but frequently goes underutilised because it is buried in unstructured text which is not amenable to traditional analytics, but is too resource intensive to process manually. While the ideal solution would be to improve the information management system to prevent the collection of such data, this only addresses the quality of future data while years of historic data will then be lost. Several authors have acknowledged the prevalence of text-based maintenance records, identifying both the potential value and problems in utilising this data, with many suggesting some form of text mining as a possible solution. Within this and related fields, there is a gap between the academic and industry focussed literature. This pertains to both the scarcity of industry (and especially maintenance specific) research and the inadequate attention given to the theoretical basis of these fields in the available industry literature. The biggest concern pertains to the violation of the independent, identically distributed (IID) assumption in maintenance data and the impact this has on the validity of various evaluation schemes. Other concerns regard the optimisation of preprocessing parameters and the evaluation metric used to assess performance. This project was completed within the CRISP-DM framework. For the research objectives, both the more practical industry-focussed studies and the more theoretical, academic studies were investigated. In the experimental component, two families of algorithms were evaluated, namely Support Vector Machines and Naïve Bayes. The focus was on the validity of the modelling and evaluation process based on problems identified in literature. Noteworthy aspects of this procedure include using a blocked cross-validation as the outer, evaluation loop of a nested crossvalidation to account for the IID violation and to prevent the over-optimisation that can occur from single-loop cross-validation. The most important contribution of this work is the experimental design which consolidates multiple validity concerns raised in academic literature but receive limited attention in industry. In particular, it addresses the violation of the IID assumption in standard cross-validations (Bergmeir and Benitez, 2012), the importance of including preprocessing into the model optimisation (Krstajic et al., 2014), the high potential of randomised search optimisation (Bergstra and Bengio, 2012) and the different formulations of the cross-validated F-score (Forman and Scholz, 2010). The recommendations made by authors investigating these issues in isolation were combined to form the experimental design. It is however worth noting that the methodological conclusions made in this study are based on the evaluation of a single dataset and is not necessarily indicative of the general behaviour. The project concludes that while text mining offers a viable solution for the identified problem, doing so is not a trivial process and would require substantial commitment from organisations wishing to utilise their data.