Robust place recognition for vision-based SLAM systems using semantic information.

Date
2024-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: When constructing a map of an unknown environment, it is important to be able to recognise when you have encountered a previously visited area. In visual simultaneous localisation and mapping (SLAM), this is known as the ‘revisiting problem’, and it is important for such ‘loops’ in a robotic vehicle’s trajectory to be detected and closed, as doing so allows for estimations of said trajectory to be improved, by removing any drift that has accumulated over time. While many methods for performing loop closure exist for vision-based robots, they rarely make use of the context of objects and features present in a scene, the so called ‘semantic content’. This is unlike human beings, who recognise environments by the presence of, and relationships between, objects and landmarks. In order to address this lack of spatial-semantic awareness, this thesis proposes a system that can describe a scene based on both its spatial structure and semantic content. Such a system allows for different scenes to be compared and matched with one another, allowing for loops to be detected and closed. The proposed solution makes use of a landmark-based 3D representation of an observed environment measured using stereo cameras, combined with semantic information found using a convolutional neural network. These aspects are combined to form the proposed scene descriptor, dubbed semantic multiview 2D projection (SeM2DP). The descriptor is then evaluated against a number of existing visual place recognition algorithms, in order to gauge its performance in terms of accuracy and processing speed. The results of these experiments show that the proposed solution has comparable to better accuracy than other existing solutions, while requiring a smaller descriptor. The main cost of using the proposed algorithm is its relatively slow computational performance, though the other tested methods were found to only around 50% faster to compute than SeM2DP. These results show that the proposed solution is well suited to being used in landmarkbased visual SLAM systems. SeM2DP is also the better method to use if semantic data is being recorded for purposes beyond just improving navigation accuracy, as this offsets the main drawback of using SeM2DP over other existing methods.
AFRIKAANSE OPSOMMING: Wanneer ’n kaart van ’n onbekende omgewing opgestel word, is dit noodsaaklik om areas wat reeds besoek is te herken. Dit staan in visuele gelyktydige lokalisering en kartering (SLAM) bekend as die ‘herbesoek-probleem’. Dit is van belang om sulke herhalende ‘lusse’ in ’n robotiese voertuig se trajek te kan herken en te sluit; sodoende kan afskattings van die trajek verbeter word deur die verwydering van foute wat mettertyd opgebou het. Alhoewel verskeie metodes om lusse in visie-gebaseerde robotiese voertuie te sluit reeds bestaan, maak dié metodes selde gebruik van ’n toneel se semantiese konteks, dit wil sê. die inligting oor voorwerpe asook visuele kenmerke wat in die toneel verskyn. Dit is in teenstelling met mense wat tonele uitken danksy sulke voorwerpe en landmerke, sowel as dié se verhoudings tussen mekaar. Om die gebrek aan ruimtelike en semantiese bewustheid aan te spreek, stel hierdie tesis ’n stelsel voor waarin ’n toneel beskryf kan word op grond van beide sy ruimtelike samestelling en sy semantiese konteks. Só ’n sisteem vergelyk tonele en vind ooreenkomstes tussen hulle, ten einde lusse in trajekte te vind en te sluit. Die voorgestelde oplossing maak gebruik van landmerk-gebaseerde drie-dimensionele modelle van ’n omgewing soos gemeet deur stereoskopiese kameras, sowel as van die semantiese inligting soos verskaf deur ’n konvolusionele neurale netwerk. Hierdie aspekte word dan gekombineer om die voorgestelde toneelbeskrywer saam te stel; dié beskrywer word semantiese multiview 2D projection (SeM2DP) genoem. SeM2DP is geëvalueer teen bestaande omgewing-herkenings algoritmes om die eersgenoemde se vermoë in terme van akkuraatheid en spoed te evalueer. Die uitslae van hierdie eksperimente wys dat die voorgestelde oplossing vergelykbaar of beter akkuraatheid verkry as die bestaande oplossings en met ’n kleiner beskrywer. Die voorgestelde algoritme se grootste nadeel is ’n langer berekeningstyd, alhoewel die ander bestaande metodes slegs ongeveer 50% vinniger is. Die eksperimentele data wys dat die voorgestelde oplossing geskik is vir gebruik in landmerk-gebaseerde visuele SLAM-stelsels. SeM2DP is ook bewys as die beter metode in gevalle waar semantiese data reeds opgeneem word vir ander doeleindes as slegs navigasie, aangesien SeM2DP se behoefte aan semantiese data sy grootste nadeel is.
Description
Thesis (MEng)--Stellenbosch University, 2024.
Keywords
Citation