Predicting process performance in the manufacturing and agricultural sectors using machine learning techniques

Khoza, Sibusiso Comfort (2021-03)

Thesis (MEng)--Stellenbosch University, 2021.

Thesis

ENGLISH ABSTRACT: The business-to-business (B2B) expenditure in the African manufacturing industry is projected to rise to almost two-thirds of$1 trillion by 2030, whilst the global agriculture and agriprocessing sector is projected to remain the largest economic sector with a B2B expenditure just$84.7 billionshy of$1 trillion by 2030. Amongst researchers and policymakers, there is a general consensus that a robust manufacturing sector is the fundamental route towards economic development and growth. In the manufacturing sector, product quality has become one of the most important factors in the success of companies. Improving agricultural productivity will be key in combating the poverty that has befallen the African continent. The increasing demand for quality land(60% of which is claimed to be in the African continent) and yields are seen as key drivers for the expected growth of the global agricultural sector. The technological innovations seen by both sectors produce data that can be mined to derive insights that will help improve quality and productivity, thus improving the bottom line for businesses. In this thesis, cognisance is given to the fact that some answers to business questions can either be numerical or categorical in nature;hence, two case studies are carried out to demonstrate the application of machine learning in providing categorical and numerical answers to business questions. In the first case study, the use of machine learning algorithms in quality control is compared to the use of statistical process monitoring, a classical quality management technique. The test dataset has a large number of features which require the use of principal component analysis and clustering to isolate the data into potential process groups. In the second case study, several machine learning algorithms were applied to predict daily milk yield in a dairy farm. Random forest, support vector machine and naive Bayes algorithms were used to predict when the manufacturing process is out of control or will produce a poor quality product. The random forest algorithm performed significantly better than both the naive Bayes and SVM algorithmson all three clusters of the dataset. The results were benchmarked against Hotelling’sT2controlcharts which were trained using 80% of each cluster dataset and tested on the remaining 20%. Incomparison with Hotelling’sT2multivariate statistical process monitoring charts, the random forest algorithm emerges as the better quality control method. The significance of this study is that it is arguably the first study comparing the application of machine learning algorithms to statistical process control. Random forest, support vector machine, and multilinear regression algorithms were used to predict daily milk yield in a dairy farm. The algorithms were applied on two subsets from a dairy farm dataset; in addition to daily milk yield, the first subset entails only the features that describe environmental conditions at the dairy farm, whilst the second subset entails the“environmental” features as well as other features that may be regarded as “health” features.Using the mean absolute percentage error as a primary metric, no algorithm is seen as superior to other algorithms on the first subset (at a significance level of 0.1). The stepwise multiline arregression algorithm performed significantly better than all non-linear-model-based algorithms. The significance of this second case study is that it compares the commonly applied multilinear regression algorithms to predict daily milk yield to the less commonly applied random forest algorithm, whilst also assessing the impact of data normalisation.

AFRIKAANSE OPSOMMING: Na verwagting sal die besigheid tot besigheid (B2B) uitgawes van die Afrika-vervaardigingsbedryf teen 2030 tot byna twee derdes van $ 1 triljoen styg, terwyl die wˆereldwye landbou- en landbouverwerkingsektor na verwagting die grootste ekonomiese sektor sal bly met B2B-uitgawes net $ 84,7 miljard minder as $ 1 triljoen teen 2030. Onder navorsers en beleidmakers is daar algemene konsensus dat ’n robuuste vervaardigingsektor die fundamentele weg na ekonomiese ontwikkeling en groei is. In die vervaardigingsektor het die kwaliteit van die produk een van die belangrikste faktore in die sukses van ondernemings geword. Die verbetering van landbouproduktiwiteit sal die sleutel wees tot die bestryding van die armoede wat die Afrika-kontinent getref het. Toenemende vraag, en die kwaliteit van grond (waarvan 60% beweer word op die vasteland van Afrika is) en opbrengste word gesien as die belangrikste dryfvere vir die verwagte groei in die landbousektor. Die tegnologiese innovasies wat deur beide sektore gesien word, lewer data op wat ontgin kan word om insigte te verkry wat sal help om kwaliteit en produktiwiteit te verbeter, en sodoende die wins van ondernemings te verbeter. In hierdie tesis word kennis gegee aan die feit dat sommige antwoorde op sakevrae numeries of kategories van aard kan wees; dus word twee gevallestudies uitgevoer om die toepassing van masjienleer te demonstreer vir die verskaffing van kategoriese en numeriese antwoorde op besigheidsvrae. In die eerste gevallestudie word die gebruik van masjienleeralgoritmes in kwaliteitsbeheer vergelyk met die gebruik van statistiese prosesmonitering, ’n klassieke kwaliteitsbestuurstegniek. Die toetsdatastel het ’n groot aantal veranderlikes, wat die gebruik van hoofkomponentontleding en groepering vereis om die data in potensi¨ele prosesgroepe te isoleer. In die tweede gevallestudie is daar verskeie masjienleeralgoritmes toegepas om die daaglikse melkopbrengs in ’n melkboerdery te voorspel. ’n random forest, support vector machine- en naive Bayes-algoritme is gebruik om te voorspel wanneer die vervaardigingsproses buite beheer is of ’n produk van swak gehalte sal lewer. Die random forest-algoritme het aansienlik beter gevaar as die naive Bayes en SVM-algoritmes op al drie groepe van die datastel. Die resultate is getoets teen die T 2 -kontrolekaart van Hotelling, wat geleer is met behulp van 80% van elke groep-datastel en op die oorblywende 20 % getoets is. In vergelyking met Hotelling se T 2 meerveranderlike statistiese prosesmoniteringskaarte, kom die random forest-algoritme steeds na vore as die beter gehaltebeheer metode. Die hoofbydrae van hierdie studie is dat dit waarskynlik die eerste studie is wat die toepassing van masjienleeralgoritmes vergelyk met statistiese prosesbeheer. Random forest, support vector machine en multilineˆere regressie algoritmes is gebruik om melkopbrengs vir ’n melkboerdery te voorspel. Die algoritmes is toegepas op twee dele van ‘n melkboerderydatastel; benewens die daaglikse melkopbrengs, bevat die eerste datastel slegs die veranderlikes wat die omgewingstoestande op die melkplaas beskryf, terwyl die tweede datastel die omgewingsveranderlikes sowel as ander veranderlikes bevat wat as gesondheidskenmerke beskou kan word. As die gemiddelde absolute persentasiefout as primˆere maatstaf gebruik word, word geen algoritme as beter beskou in vergelyking met die ander algoritmes op die eerste datastel nie (op ’n betekenisvlak van 0.1). Die stapsgewyse multilineˆere regressie algoritme het aansienlik beter gevaar as alle nie-lineˆere-model-gebaseerde algoritmes. Die hoofbydrae van hierdie studie is dat dit die algemeen toegepaste multilineˆere regressie algoritmes om daaglikse melkopbrengste te voorspel vergelyk met die minder algemeen toegepaste random forest algoritme, terwyl die impak van data-normalisering ook beoordeel word.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/109833
This item appears in the following collections: