A computer vision framework towards automated scene understanding & analysis
Date
2025-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
It is well-known that recent advancements in the domain of artificial intelligence and the increased
capability of computer hardware have significantly advanced the field of computer vision
– a field of study which enables computers to “see” and extract meaningful information from
visual inputs, similar to human perception.
A prominent application area within the domain of computer vision is scene understanding.
Various powerful approaches towards scene understanding employ computer vision tasks to extrapolate
semantic information about scenes, allowing computers to understand relationships
between objects and their environments. Such computer vision tasks include object detection,
recognition, tracking, pose estimation, and contextual reasoning. Most computer vision
algorithms are deep learning based approaches but differ significantly in architecture. The computer
vision tasks investigated in this thesis utilise architectures consisting of backbone, neck,
and head architecture as well as alternative transformer architectures.
Although computer vision applications are diverse, there remain fields that have not yet fully
benefited from these developments. One such field is energy auditing – a process undertaken to
evaluate and improve the energy management of buildings.
In this thesis, a proof-of-concept framework is developed, capable of extracting information
regarding appliances present in a given building scene or environment by employing object
detection and object tracking tasks. The objective of the proposed framework is to train various
object detection models and recommend the best-performing model for further implementation,
in conjunction with object tracking models, to analyse video footage of environments needing to
be audited. The framework facilitates the processing of raw data, training of object detection
models with respect to the proposed data, and the deployment of the trained model with respect
to unseen video footage.
A structured literature review is conducted in this thesis to investigate the pertinent literature
related to computer vision applications within the energy auditing domain. The fundamentals of
deep learning, computer vision and energy auditing are also explored. The proposed framework is
first applied to a subset of a publicly accepted benchmark dataset to verify its correct functioning.
Subsequently, to further assess the framework’s performance and applicability, it is applied to
a novel case study dataset provided by an industry partner, containing images of appliances
common in an educational institution. The framework facilitates hyperparameter tuning to
determine the best parameters for each model being trained. The best-performing model, RTDeTR,
is then utilised to detect and track appliances of interest, providing information regarding
the number of appliances present. The information attained by the models is essential for the
environment’s energy consumption computation.