Mapping real-world objects into virtual reality to facilitate interaction using 6dof pose estimation

Date
2024-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Virtual reality (VR) has been around for decades, but its surge into the mainstream spotlight has only recently taken hold. With this renewed focus, various avenues are being explored to refine VR experiences. One of these avenues, and the focus of this thesis, is the mapping of the real world into the virtual space. With the rise of convolutional neural networks (CNNs) in computer vision, they have become the cornerstone for addressing these vision challenges that remain difficult to solve with traditional algorithms, serving as one of the primary solutions to explore for mapping real-world entities into VR. This thesis delves into the intricate challenges and methodologies associated with achieving this mapping. A significant portion of the research is dedicated to the creation of diverse, synthetically generated datasets. These datasets, meticulously crafted, conform to the same format as the LineMOD dataset and encompass a range of scenarios and objects, ensuring a comprehensive foundation for model training. The study places particular emphasis on the EfficientPose model, a state-of-the-art model designed for 6DoF pose estimation of an object. The research methodology involved leveraging the headset’s built-in tracking and spatial mapping to capture the real world in the virtual domain. Subsequently, EfficientPose tracks real-world objects using standard RGB Logitech 270 webcams connected to the computer. The output data from EfficientPose is then relayed to the VR experience, mapping the object in near real-time, enhancing immersion and interaction. An implementation of Aruco marker object tracking was also implemented and served as a well-established baseline against which the deep learning approach was compared. While the EfficientPose model demonstrates potential in tracking various objects, especially in diverse VR interactions, the research sheds light on its limitations and areas for enhancement. Specific objects were used to evaluate the model’s efficacy: the duck from the LineMOD dataset, a gun, a knife, and a cube. These objects were chosen to test diverse object shapes. The duck and the cube were further tested in both textured and textureless versions to evaluate the impact of texture on pose estimation. Additionally, two distinct pipelines were established for the object integration: one where objects underwent a 3D reconstruction process, and another where the objects were taken directly from 3D printing models. The findings reveal that certain objects, due to their unique shape or symmetry, pose inherent challenges for pose estimation. Symmetric objects, in particular, present difficulties when the dataset is not tailored to account for their characteristics. Incorporating user interactions, particularly occlusions introduced by users’ hands, the study evaluates the model’s robustness and response. The results highlight the model’s commendable ability to handle such challenges, though with room for improvement to achieve seamless VR experiences. The findings reveal that while the model’s versatility is a significant asset, its accuracy, especially in real-world VR deployments, requires further refinement. The study also touches upon the importance of a multi-camera implementation, especially in scenarios with heavy occlusions. Such an implementation not only improves the robustness of the system but is identified as a crucial next step for refinement.
AFRIKAANSE OPSOMMING: Virtuele werklikheid (VW) bestaan al vir dekades, maar dit het onlangs eers prominent in die kollig gestaan. Met hierdie hernude fokus, word verskeie roetes ondersoek om VW ervarings te verfyn. Een van hierdie roetes, en die fokus van hierdie tesis, is die kartering van die werklike wˆereld na die virtuele ruimte. Met die opkoms van konvolusie neurale netwerke (KNNs) in rekenaarvisie, het hulle die hoeksteen geword om hierdie visie uitdagings aan te spreek wat moeilik is om met tradisionele algoritmes op te los, en dien as een van die primˆere oplossings om werklike entiteite in VW te karteer. Hierdie tesis duik diep in die ingewikkelde uitdagings en metodologie¨e wat verband hou met hierdie kartering. ’n Beduidende deel van die navorsing is gewy aan die skepping van uiteenlopende, sinteties gegenereerde datastelle. Hierdie datastelle, noukeurig saamgestel, voldoen aan dieselfde formaat as die LineMOD datastel en sluit ’n reeks scenario’s en voorwerpe in, wat ’n omvattende basis verseker vir modelopleiding. Die studie plaas ’n besondere klem op die EfficientPose-model, ’n vooruitstrewende model ontwerp vir 6DoF posisie skatting van ’n voorwerp. Die navorsingsmetodologie het ingesluit die gebruik van die hoofstel se ingeboude opsporing en ruimtelike kartering om die werklike wˆereld in die virtuele domein vas te lˆe. Daarna spoor EfficientPose werklike voorwerpe met behulp van standaard RGB Logitech 270 webkameras wat aan die rekenaar gekoppel is. Die uitsetdata van EfficientPose word dan na die VW-ervaring gestuur, en karteer die voorwerp byna in werklike tyd, wat die onderdompeling en interaksie verbeter. ’n Implementering van. Aruco merker voorwerp opsporing is ook ge¨ımplementeer en het gedien as ’n gevestigde basislyn waarteen die diepleer benadering vergelyk is. Alhoewel die EfficientPose-model potensiaal toon in die opsporing van verskeie voorwerpe, veral in diverse VW-interaksies, werp die navorsing lig op die beperkings en areas vir verbetering daarvan. Spesifieke voorwerpe is gebruik om die model se doeltreffendheid te evalueer: die eend van die LineMOD datastel, ’n geweer, ’n mes en ’n kubus. Hierdie voorwerpe is gekies om diverse voorwerp vorms te toets. Die eend en die kubus is verder getoets in beide getekstureerde en ongetekstureerde weergawes om die impak van tekstuur op posisie skatting te evalueer. Verder is twee onderskeie pyplyne gevestig vir voorwerpintegrasie: een waar voorwerpe ’n 3D rekonstruksie proses ondergaan het, en ’n ander waar die voorwerpe direk uit 3D drukmodelle geneem is. Die bevindings toon dat sekere voorwerpe, as gevolg van hul unieke vorm of simmetrie, inherente uitdagings vir posisie skatting bied. Simmetriese voorwerpe, in die besonder, bied moeilikhede wanneer die datastel nie aangepas is om hul eienskappe te akkommodeer nie. Deur gebruikersinteraksies in te sluit, veral okklusies wat deur gebruikers se hande ingevoer is, evalueer die studie die model se robuustheid en reaksie. Die resultate beklemtoon die model se lofwaardige vermo¨e om sulke uitdagings te hanteer, alhoewel daar ruimte vir verbetering is om ’n naatlose VW-ervaring te bereik. Die bevindings toon dat alhoewel die model se veelvuldigheid ’n beduidende bate is, vereis sy akkuraatheid, veral in werklike VW-implementerings, verdere verfynings. Die studie raak ook aan die belangrikheid van ’n multi-kamera implementering, veral in scenario’s met swaar okklusies. Sulke ’n implementering verbeter nie net die robuustheid van die stelsel nie, maar is ge¨ıdentifiseer as ’n noodsaaklike volgende stap vir verfynings.
Description
Thesis (MEng)--Stellenbosch University, 2024.
Keywords
Citation