A remote sensing-machine learning framework for modelling forest health

Poona, Nitesh Keshavelal (2020-12)

Thesis (DPhil)--Stellenbosch University, 2020.


ENGLISH ABSTRACT: The utility of remote sensing data, in particular high dimensional spectroscopy data, is now widely used for the detection and monitoring of pest and disease in agriculture and forestry. Coupled with advanced data analytics, spectroscopic data can provide a wealth of information regarding vegetation health, and successfully demonstrates the utility of spectroscopic data and advanced machine learning (ML) algorithms, i.e. tree-based ensemble learners, by developing a remote sensing-machine learning framework for forest health assessment and monitoring. Specifically, the research investigates the use of spectroscopic data for modelling Fusarium circinatum stress in Pinus radiata and Pinus patula. The research first investigated the utility of novel wrapper feature selection algorithms embedded with the random forest (RF) learner to develop classification models for discriminating healthy, infected, and damaged P. radiata and P. patula seedlings within a nursery environment. Results showed that reducing data dimensionality results in improved model accuracies. More importantly, the results showed that the RF-Boruta framework yielded the best results. Two RF variants were subsequently explored, namely oblique random forest (oRF), and rotation forest (rotF). The performances of oRF and rotF were benchmarked against those of traditional RF. All models were evaluated in terms of their ability to discriminate healthy and stressed Pinus seedlings. Spectral resampling was employed to reduce data dimensionality. The oRF model yielded the best results, with oRF svm (oRF employing support vector machine as splitting model) proving to be the most robust. To extend the utility of model building, the research developed normalised difference two-band spectral indices for real-time F. circinatum stress detection. The Boruta algorithm was employed to identify relevant bands, which were used to derive two-band indices. The indices were compared with an extensive list of currently available indices, identified from the literature, to assess the value thereof. Indices were evaluated within univariate and multivariate paradigms, with the latter proving more adept at classifying healthy, damaged, and infected seedlings.The use of high spatial resolution satellite remote sensing imagery for modelling pitch canker in P. radiata trees in a commercial plantation was also evaluated. This exploration served to complement the remote sensing-machine learning framework developed for the nursery environment. In this component of the research, an artificial neural network model was used (whereas tree-based ensemble models were used in the former elements of the research). Results highlight the potential of using high spatial resolution satellite remote sensing for mapping and monitoring of pitch canker infected trees. Overall, the research successfully demonstrated that high spectral and high spatial resolution remotely sensed data, coupled with advanced data analytics, i.e. tree-based ensemble learners and wrapper algorithms, provides a potentially operational and economically viable framework for F. circinatum management within a nursery and plantation environment.

AFRIKAANSE OPSOMMING: Afstandwaarnemingsdata, veral hoë-dimensionele spektroskopiedata, word gereeld gebruik vir die opsporing en monitering van plae en siektes in die landbou-en bosbousektor. Tesame met gevorderde data-analise, kan spektroskopiese data 'n magdom inligting verskaf oor die toestand van plantegroei. Spektroskopiese data en gevorderde masjienleer-algoritmes, of altans boom-gebaseerde ensemble-leerders, benut kan word vir die ontwikkeling van 'n raamwerk om die gesondheidstoestand van die bos te assesseer en te monitor. Hierdie navorsing ondersoek spesifiek die gebruik van spektroskopiese data vir die modellering van Fusarium circinatumin Pinus radiata en Pinus patula. Die navorsing het die nut van die seleksie-algoritmes ondersoek deur nuwe wikkelfunksies by die ewekansige woud (RF) algoritme te inkorporeer om klassifikasiemodelle te ontwikkel wat gesonde, besmette en beskadigde P. radiata-en P. patula saailinge binne 'n kwekery omgewing onderskei het. Resultate het getoon dat die vermindering van data dimensionaliteit na 'n hoër akkuraatheid van die model toe lei. Die resultate het ook getoon dat die RF-Boruta-raamwerk die beste resultate gelewer het. Daarna is twee RF-variante ondersoek, naamlik skuins ewekansige woud (oRF), en rotasiewoud (rotF). Die prestasie van oRF en rotF is vergelyk met die van tradisionele RF. Al die modelle is beoordeel aan die hand van hul vermoë om gesonde en beskadigde Pinus-saailinge te onderskei. Spektrale herversameling is gebruik om die dimensionaliteit van die data te verminder. Die oRF model het die beste resultate gelewer, met oRF svm (wat ondersteunings-vektor masjien as verdelingsmodel gebruik) wat die sterkste was. Om die bruikbaarheid van modelbou uit te brei, het die navorsing genormaliseerde verskillende tweebandspektrale indekse ontwikkel om F. circinatum stres intyds op te spoor. Die Boruta-algoritme is gebruik om relevante bande te identifiseer en dan om tweeband indekse af te lei. Die indekse is vergelyk met indekse wat uit die literatuur geïdentifiseer is om die waarde daarvan te beoordeel. Indekse is beoordeel binne eenveranderlike en meerveranderlike paradigmas, en laasgenoemde het gesonde, beskadigde en besmette saailinge beter klassifiseer. Die gebruik van satellietafstandswaarnemingsbeelde met hoë ruimtelike resolusie vir die modellering van kankerin P. radiata bome in 'n kommersiële plantasie is ook ondersoek. Hierdie afdeling van die navorsing het as aanvulling tot die afstandswaarneming-masjienleer raamwerk wat vir die kwekery omgewing ontwikkel is gedien. In hierdie komponent van die navorsing is 'n kunsmatige neurale netwerkmodel gebruik (terwyl boom-gebaseerde ensemble-modelle in die vorige elemente van die navorsing is.) Resultate beklemtoon die potensiaal van die gebruik van satellietafstandswaarnemingsdata met 'n hoë ruimtelike resolusie vir die kartering en monitering van besmette bome. Hierdie navorsing het getoon dat hoë-spektrale en hoë ruimtelike resolusie afstandswaarnemingsdata, tesame met gevorderde data-analise (boomgebaseerde ensemble-leerders en wikkel-algoritmes), 'n ekonomiese uitvoerbare bedryfsraamwerk bied vir die bestuur van F. circinatumin kwekery-en plantasie-omgewings.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/109228
This item appears in the following collections: