The impact of training set size and feature dimensionality on supervised object-based classification : a comparison of three classifiers

Myburgh, Gerhard (2012-12)

Thesis (MSc)--Stellenbosch University, 2012.

Thesis

ENGLISH ABSTRACT: Supervised classifiers are commonly used in remote sensing to extract land cover information. They are, however, limited in their ability to cost-effectively produce sufficiently accurate land cover maps. Various factors affect the accuracy of supervised classifiers. Notably, the number of available training samples is known to significantly influence classifier performance and to obtain a sufficient number of samples is not always practical. The support vector machine (SVM) does perform well with a limited number of training samples. But little research has been done to evaluate SVM’s performance for geographical object-based image analysis (GEOBIA). GEOBIA also allows the easy integration of additional features into the classification process, a factor which may significantly influence classification accuracies. As such, two experiments were developed and implemented in this research. The first compared the performances of object-based SVM, maximum likelihood (ML) and nearest neighbour (NN) classifiers using varying training set sizes. The effect of feature dimensionality on classifier accuracy was investigated in the second experiment. A SPOT 5 subscene and a four-class classification scheme were used. For the first experiment, training set sizes ranging from 4-20 per land cover class were tested. The performance of all the classifiers improved significantly as the training set size was increased. The ML classifier performed poorly when few (<10 per class) training samples were used and the NN classifier performed poorly compared to SVM throughout the experiment. SVM was the superior classifier for all training set sizes although ML achieved competitive results for sets of 12 or more training samples per class. Training sets were kept constant (20 and 10 samples per class) for the second experiment while an increasing number of features (1 to 22) were included. SVM consistently produced superior classification results. SVM and NN were not significantly (negatively) affected by an increase in feature dimensionality, but ML’s ability to perform under conditions of large feature dimensionalities and few training areas was limited. Further investigations using a variety of imagery types, classification schemes and additional features; finding optimal combinations of training set size and number of features; and determining the effect of specific features should prove valuable in developing more costeffective ways to process large volumes of satellite imagery. KEYWORDS Supervised classification, land cover, support vector machine, nearest neighbour classification maximum likelihood classification, geographic object-based image analysis

AFRIKAANSE OPSOMMING: Gerigte klassifiseerders word gereeld aangewend in afstandswaarneming om inligting oor landdekking te onttrek. Sulke klassifiseerders het egter beperkte vermoëns om akkurate landdekkingskaarte koste-effektief te produseer. Verskeie faktore het ʼn uitwerking op die akkuraatheid van gerigte klassifiseerders. Dit is veral bekend dat die getal beskikbare opleidingseenhede ʼn beduidende invloed op klassifiseerderakkuraatheid het en dit is nie altyd prakties om voldoende getalle te bekom nie. Die steunvektormasjien (SVM) werk goed met beperkte getalle opleidingseenhede. Min navorsing is egter gedoen om SVM se verrigting vir geografiese objek-gebaseerde beeldanalise (GEOBIA) te evalueer. GEOBIA vergemaklik die integrasie van addisionele kenmerke in die klassifikasie proses, ʼn faktor wat klassifikasie akkuraathede aansienlik kan beïnvloed. Twee eksperimente is gevolglik ontwikkel en geïmplementeer in hierdie navorsing. Die eerste eksperiment het objekgebaseerde SVM, maksimum waarskynlikheids- (ML) en naaste naburige (NN) klassifiseerders se verrigtings met verskillende groottes van opleidingstelle vergelyk. Die effek van kenmerkdimensionaliteit is in die tweede eksperiment ondersoek. ʼn SPOT 5 subbeeld en ʼn vier-klas klassifikasieskema is aangewend. Opleidingstelgroottes van 4-20 per landdekkingsklas is in die eerste eksperiment getoets. Die verrigting van die klassifiseerders het beduidend met ʼn toename in die grootte van die opleidingstelle verbeter. ML het swak presteer wanneer min (<10 per klas) opleidingseenhede gebruik is en NN het, in vergelyking met SVM, deurgaans swak presteer. SVM het die beste presteer vir alle groottes van opleidingstelle alhoewel ML kompeterend was vir stelle van 12 of meer opleidingseenhede per klas. Die grootte van die opleidingstelle is konstant gehou (20 en 10 eenhede per klas) in die tweede eksperiment waarin ʼn toenemende getal kenmerke (1 tot 22) toegevoeg is. SVM het deurgaans beter klassifikasieresultate gelewer. SVM en NN was nie beduidend (negatief) beïnvloed deur ʼn toename in kenmerkdimensionaliteit nie, maar ML se vermoë om te presteer onder toestande van groot kenmerkdimensionaliteite en min opleidingsareas was beperk. Verdere ondersoeke met ʼn verskeidenheid beelde, klassifikasie skemas en addisionele kenmerke; die vind van optimale kombinasies van opleidingstelgrootte en getal kenmerke; en die bepaling van die effek van spesifieke kenmerke sal waardevol wees in die ontwikkelling van meer koste effektiewe metodes om groot volumes satellietbeelde te prosesseer. TREFWOORDE Gerigte klassifikasie, landdekking, steunvektormasjien, naaste naburige klassifikasie, maksimum waarskynlikheidsklassifikasie, geografiese objekgebaseerde beeldanalise

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/71655
This item appears in the following collections: