A computer vision framework towards automated scene understanding & analysis

Sarah-lee de Greeff

A computer vision framework towards automated scene understanding & analysis

Files

DE GREEF Sarah-Lee.PDF(2.46 MB)

Date

2025-03

Authors

Sarah-lee de Greeff

Abstract

It is well-known that recent advancements in the domain of artificial intelligence and the increased capability of computer hardware have significantly advanced the field of computer vision – a field of study which enables computers to “see” and extract meaningful information from visual inputs, similar to human perception. A prominent application area within the domain of computer vision is scene understanding. Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate semantic information about scenes, allowing computers to understand relationships between objects and their environments. Such computer vision tasks include object detection, recognition, tracking, pose estimation, and contextual reasoning. Most computer vision algorithms are deep learning based approaches but differ significantly in architecture. The computer vision tasks investigated in this thesis utilise architectures consisting of backbone, neck, and head architecture as well as alternative transformer architectures. Although computer vision applications are diverse, there remain fields that have not yet fully benefited from these developments. One such field is energy auditing – a process undertaken to evaluate and improve the energy management of buildings. In this thesis, a proof-of-concept framework is developed, capable of extracting information regarding appliances present in a given building scene or environment by employing object detection and object tracking tasks. The objective of the proposed framework is to train various object detection models and recommend the best-performing model for further implementation, in conjunction with object tracking models, to analyse video footage of environments needing to be audited. The framework facilitates the processing of raw data, training of object detection models with respect to the proposed data, and the deployment of the trained model with respect to unseen video footage. A structured literature review is conducted in this thesis to investigate the pertinent literature related to computer vision applications within the energy auditing domain. The fundamentals of deep learning, computer vision and energy auditing are also explored. The proposed framework is first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning. Subsequently, to further assess the framework’s performance and applicability, it is applied to a novel case study dataset provided by an industry partner, containing images of appliances common in an educational institution. The framework facilitates hyperparameter tuning to determine the best parameters for each model being trained. The best-performing model, RTDeTR, is then utilised to detect and track appliances of interest, providing information regarding the number of appliances present. The information attained by the models is essential for the environment’s energy consumption computation.

URI

https://scholar.sun.ac.za/handle/10019.1/131922

Collections

Graduation - 2025 - March (Embargoes)

Full item page