Application of convolutional neural networks to building segmentation in aerial images

Abstract
ENGLISH ABSTRACT : Aerial image labelling has found relevance in diverse areas including urban management, agriculture, climate, mining, and cartography. As a result, research efforts have been intensified to find fast and accurate algorithms. The current state-of-the-art results in this context have been achieved by deep convolutional neural networks (CNNs). This has been possible because of advances in computing technologies such as fast GPUs and the discovery of optimal architectures. One of the main challenges in using deep CNNs is the need for a large set of ground truth labels during the training phase. Moreover, one has to choose optimal values for the many hyperparameters involved in the model construction to get a good result. In this thesis we focus on building segmentation from aerial images, and study the effect of different hyperparameter values, paying particular attention to the generalisation ability of the resulting models. For all our experiments we use the same architecture and performance metric as the one used in Mnih & Hinton (2012). Our investigation found the following main results: 1) when it comes to the size of CNN filters, small size filters perform as good or even better than large sized filters; 2) the LeakyReLU activation functions lead to a better precision-recall curve than ReLU (Rectified Linear unit) and Tanh activation functions; 3) batch-normalization leads to a slightly poor breakeven point than without batch-normalization - this is contrary to what has been found in other studies with different architectures. In addition, we also investigate how well our models generalise to the task of interpreting contexts that are different from the training sets. Drawing from our findings, we gave recommendations on how to make deep CNN models more robust to variations in aerial images of other continent such as Africa where annotations are either unavailable or in short supply.
AFRIKAANSE OPSOMMING : Lugfoto-etikettering het relevansie gevind in verskeie gebiede, insluitende stedelike bestuur, landbou,klimaat, mynbou en kartografie. As gevolg hiervan is navorsingspogings versterk om vinnige en akkurate algoritmes te vind. Die huidige state-of-the-art resultate in hierdie konteks is bereik deur diep konvolusie neurale netwerke (CNNs). Dit is moontlik as gevolg van vooruitgang in rekenaar tegnologie soos vinnige GPU’s en die ontdekking van optimale argitektuur. Een van die grootste uitdagings in die gebruik van diep CNN’s is die behoefte aan ’n groot aantal grondwaarheidetikette gedurende die opleidingsfase. Daarbenewens moet mens optimale waardes kies vir die baie hiperparameters wat by die modelkonstruksie betrokke is om ’m goeie resultaat te kry. In hierdie proefskrif het ons fokus op die bou van segmentering van lugfoto’s en bestudeer die effek van verskillende hiperparameterwaardes, met spesiale aandag aan die veralgemeningsvermoe van die gevolglike modelle. Vir al ons eksperimente gebruik ons dieselfde argitektuur en prestasiemetriek as die een wat in Mnih en Hinton (2012) gebruik word. Ons ondersoek het die volgende hoofresultate gevind: 1) As dit by die grootte van CNN-filters kom, doen klein grootte filters so goed of selfs beter as groot grootte filters; 2) die LeakyReLU aktiverings funksies lei tot ’n beter presisie-herhalingskromme as ReLU (reggestelde lineere eenheid) en Tanh aktiverings funksies; 3) batch-normalsering lei tot ’n effens swak gelykbreekpunt as sonder batch-normalisering dit is strydig met wat in ander studies met verskillende argitekture gevind is. Daarbenewens ondersoek ons ook hoe goed ons modelle veralgemeen in die interpretasie van kontekste wat verskil van die opleidingsstelle. Op grond van ons bevindinge, het ons aanbevelings gegee oor hoe om diep CNN-modelle sterker te maak vir variasies in lugfoto’s van ander vastelande soos Afrika waar annotasies of onbeskikbaar of in gebreke is.
Description
Thesis (MSc)--Stellenbosch University, 2018.
Keywords
Neural networks (Computer science), Computer graphics, Remote sensing, Image segmentation, UCTD
Citation