Discriminating between forest plantation genera using remote sensing and machine learning algorithms

Date
2021-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Forest inventories are constructed on a compartmental level and contain information such as forest age, species/genus, location, and extent. An up-to-date forest inventory is critical for monitoring harvests, assessing the production of timber, planning, maximising production, assessing water use, and assessing timber quality. On a national scale, forest inventories are used for monitoring the impact forests have on the climate and stream flow, assessing the contribution forests have on alleviating poverty, monitoring forest trends, and supporting policy and trade decisions. Conventional methods for obtaining forest inventory information, such as plantation genus/species, is done in-field, which is time-consuming and costly. Remote sensing is a more efficient way to capture forest genus information. Very high-resolution, hyperspectral, and unmanned aerial vehicle (UAV) imagery have been shown to contain suitable spectral and spatial information for machine learning algorithms to differentiate between forest species. However, such data requires extensive processing and is expensive to acquire, making it unsuitable for mapping over larger areas. High-resolution imagery, such as Sentinel-2, combined with textural measures and vegetation indices as features in machine learning algorithms, have shown potential to differentiate between spectrally similar classes. However, it is not known what impact training sample configuration and size have on classification accuracies when classifying acacia, eucalyptus, and pinus (pine) genera. It is also not known whether signature extension is a viable method for reducing the time and effort spent on obtaining in situ training data when mapping forest plantations over a large and complex area. This research set out two main experiments. The first experiment evaluated the impact of using an even, uneven, or an area-proportionate training sample configuration and size in a random forest machine learning model for classifying acacia, eucalyptus, and pine compartments. It was found that the study area that contained an uneven area planted with acacia, eucalyptus, and pine trees was classified more accurately using a balanced training sample configuration, compared to using an unbalanced and area-proportionate training sample configuration. It was also found that a saturation point exists where adding more training samples adds little value to the overall accuracy (OA). The saturation point was found to be ~ 57n, where n is the number of features used in the classification. The second set of experiments was set out to test the viability of training data signature extension for constructing random forest machine learning models to differentiate between acacia, eucalyptus, and pine trees using Sentinel-2 imagery as input. The study area was split into 19 Sentinel-2 tiles spanning the Mpumalanga, KwaZulu-Natal, Eastern Cape, and Western Cape provinces. Three separate random forest models were built using training data collected in one tile located in Mpumalanga, one tile located in KwaZulu-Natal, and one tile located in the Eastern Cape. A fourth model was built using training data from all three source tiles. The four models were applied to all 19 Sentinel-2 tiles to map forest plantation genera. The results show that a ~70% OA can be achieved if the training data is collected in areas with similar climates (rainfall seasonality) to the areas that are being mapped. In addition, it was found that signature extension distance (i.e. distance between the training data and the area being classified) should not exceed 500 km.
AFRIKAANSE OPSOMMING: Bosplantasie inventarisse word op 'n kompartementele vlak saamgestel en bevat inligting soos die ouderdom, spesie/genus, ligging en omvang van die plantasie. 'n Bygewerkte bosplantasie inventaris is van kritieke belang vir die monitering van oeste, die assessering van houtproduksie, beplanning, maksimalisering van produksie en die assessering van watergebruik en houtgehalte. Op nasionale skaal word bosplantasie inventarisse gebruik om die impak wat bosbou op die klimaat en stroomvloei het te monitor, die bydraes wat bosbou maak om armoede te verlig te assesseer, die tendense in bosbou te moniteer en beleids- en handelsbesluite te ondersteun. Konvensionele metodes om bosinventarisinligting, soos plantasie-genus/spesie, te bekom word in die veld gedoen, wat tydrowend en duur is. Afstandswaarneming is 'n doeltreffender manier om boom-genusinligting vas te le. Daar is getoon dat baie-hoe-resolusie- en hiperspektrale beelde, asook beelde geneem uit onbemande lugvoertuie, geskikte spektrale en ruimtelike inligting bevat om masjienleer-algoritmes in staat te stel om tussen boomspesies te onderskei. Sodanige data verg egter omvattende verwerking en is duur om te bekom, wat dit ongeskik maak om groot gebiede te karteer. Hoe-resolusiebeelde, soos Sentinel-2, gekombineer met tekstuurmaatstawwe en plantegroei-indekse as veranderlikes in masjienleer-algoritmes, toon potensiaal om tussen klasse met soortgelyke spektrale eienskappe te kan onderskei. Dit is egter nie bekend hoe opleidingsdata konfigurasie en grootte die akkuraatheid van akasia, bloekom en pinus (denne) genera klassifikasies sal beinvloed nie. Dit is ook nie bekend of klassifiseerder-uitbreiding 'n lewensvatbare metode is om die tyd en moeite benodig om opleidingsdata in situ te bekom, te verminder wanneer bosplantasies oor 'n groot gebied gekarteer word nie. Hierdie navorsing het twee hoofeksperimente uiteengesit. Die eerste eksperiment het die impak van die gebruik van 'n gelyke, ongelyke of area-proporsionele opleidingmonsteropstelling en - grootte in 'n ewekansige-woud-masjienleermodel vir die klassifikasie van akasia-, bloekom- en denneplantasies geevalueer. Meer akkurate resultate is vir die studiegebied wat 'n ongelyke area met akasia, bloekom en dennebome bevat behaal wanneer 'n gebalanseerde opleidingmonsteropstelling gebruik is. Daar is ook gevind dat 'n versadigingspunt bestaan waar die toevoeging van meer opleidingmonsters min waarde tot die algehele akkuraatheid (AA) toevoeg. Die versadigingspunt is ~ 57n, waar n die aantal veranderlikes wat in die klassifikasie gebruik word verteenwoordig. Die tweede stel eksperimente is uitgevoer om die lewensvatbaarheid van klassifikasie-uitbreiding te toets. Ewekansige-woud-masjienleer is aangewend om tussen akasia, bloekom en dennebome, met Sentinel-2-beelde as toevoer, te onderskei. Die studiegebied is verdeel in 19 Sentinel-2-teëls wat oor die Mpumalanga, KwaZulu-Natal, Oos-Kaap en Wes-Kaap provinsies strek. Drie afsonderlike ewekansige-woud-modelle is met behulp van opleidingsdata, wat onderskeidelik in een teël in Mpumalanga, een teël in KwaZulu-Natal en een teël in die Oos-Kaap ingesamel is, gebou. 'n Vierde model is met behulp van opleidingsdata van al drie bronteëls gebou. Die vier modelle is op al 19 Sentinel-2-teëls toegepas om plantasie genera te karteer. Die resultate toon dat 'n ~ 70% AA behaal kan word indien die opleidingsdata in gebiede met soortgelyke klimate (reënval seisoenaliteit) as die areas wat gekarteer word, ingewin word. Daarbenewens is gevind dat die afstand van klassifiseerder-uibreiding (d.w.s. afstand tussen die opleidingsdata en die area wat geklassifiseer word) nie 500 km moet oorskry nie.
Description
Thesis (MSc)--Stellenbosch University, 2021.
Keywords
Machine learning, Remote sensing, High resolution imaging, Forest mapping, Forests and forestry -- Remote sensing, Tree farms -- Genome mapping, Algorithms, UCTD
Citation