Markerless versus marker-based 3D human pose estimation for strength and conditioning exercise identification.

Date
2023-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: This research aims to contribute to the engineering of a virtual AI-based personal trainer that leverages modern advancements in computer vision. An artificially intelligent personal trainer has the potential to identify, record and assess performances of strength and conditioning exercises where a human trainer is not available. By viewing users through cameras, it could perform 3D human pose estimation to extract the motion of key joints through space. This results in a sequence of skeletons that is a compact representation of the exercise action being performed. In this thesis, we contribute to this concept by developing a motion capture system that records a subject and reconstructs their 3D pose. This can be called markerless motion capture to distinguish it from optical marker-based motion capture used in research and industry. Markerless solutions have the potential to make motion capture more accessible because they are non-intrusive, easy to use and affordable. This would be especially valuable in medical settings where clinicians use motion capture for the diagnosis and treatment of neuromuscular disorders. We used both a research-grade marker-based motion capture system and our markerless system to collect a dataset of motion capture samples of seven different classes of strength and conditioning exercises. We then compared their 3D pose reconstructions to evaluate markerless methods as a means to replace traditional marker-based methods. We studied the errors that must be addressed to make markerless technology truly accessible. From these investigations, we designed strategies that refine markerless pose reconstruction by factoring in the natural expected trajectory of bodily joints. With our most sophisticated method, we could reduce the top 10% error of the markerless motion capture by more than 25%. From the skeleton sequences of different exercises we captured, we developed a skeleton based exercise recognition system using deep learning models. We used a powerful graph convolutional network (GCN) architecture to learn spatial-temporal features for action identification. First, we explored transfer learning by pre-training on a large skeleton-based action dataset, which achieves perfect or near-perfect classification accuracy on our seven exercise classes. Secondly, we explored the more challenging task of one-shot action recognition. This will be a more useful exercise identification system since enrolment of exercises will only require one example of an exercise. We used the GCN model as a feature extractor to learn a metric that projects similar actions closer together in an embedding space and dissimilar actions further apart. Our model achieved a classification accuracy of 87.4% on the seven never-before-seen exercise classes. Our research proves that a markerless motion capture system is sufficient for the capture of 3D pose for application where accuracy and consistency are not of utmost importance, such as for exercise identification, but that more research and development is required before markerless methods can replace the traditional maker-based motion capture used in clinical settings.
AFRIKAANS OPSOMMING: ’n Kunsmatig intelligente persoonlike afrigter het die potensiaal om uitvoering van kondisioneringsoefeninge te identifiseer, aan te teken en te assesseer wanneer ’n menslike afrigter nie beskikbaar is nie. Deur om die 3D-posisie van ’n gebruiker te skat deur oefeninge deur kameras te bekyk, kan dit die oefenaksie wat gebruikers uitvoer direk klassifiseer vanaf die beweging van hul gewrigte deur ruimte (bekend as ’n skelet-sekwensie). In hierdie tesis dra ons by tot hierdie konsep deur ’n bewegingsopnamestelsel te ontwikkel wat ’n persoon opneem en hul 3D-postuur rekonstrueer. Dit kan merkerlose bewegingsopname genoem word om dit te onderskei van optiese merker-gebaseerde bewegingsopname wat in navorsing en die industrie gebruik word. Merkerlose oplossings het die potensiaal om bewegingsopname meer toeganklik te maak deur nie-indringend, maklik om te gebruik en bekostigbaar te wees. Ons gebruik beide ’n navorsingsgraad-merker-gebaseerde Vicon-stelsel en ons merkerlose stelsel om ’n datastel van oefenbeweging in te samel. Ons vergelyk dan hul 3D-postuur-rekonstruksies om merkerlose metodes te evalueer as ’n manier om tradisionele merker-gebaseerde metodes te vervang. Ons bestudeer die foute wat aangespreek moet word om merkerlose tegnologie werklik ontwrigtend te maak. Uit hierdie ondersoeke ontwerp ons strategie¨e wat merkerlose postuurrekonstruksie verfyn deur om die natuurlike verwagte trajek van liggaamlike gewrigte in te reken. Met ons mees gesofistikeerde metode kan ons die top 10% fout van die merkerlose bewegingsopname met meer as 25% verminder. Uit die oefeningskeletbewegings ontwikkel ons ’n skelet-gebaseerde oefeningherkenningstelsel gebaseer op diepleermodelle. Ons gebruik ’n kragtige grafiek-konvolusie argitektuur (GCN) om ruimtelike-temporele kenmerke te onttrek vir aksie-identifikasie. Eerstens het ons oordragleer ondersoek deur vooraf te oefen op ’n groot skelet-gebaseerde aksiedatastel, wat ’n perfekte of byna volmaakte akkuraatheid op ons sewe oefenklasse behaal. Tweedens het ons die meer uitdagende taak van eenskoot-aksie-herkenning ondersoek. Dit sal ’n meer bruikbare oefening-identifikasiestelsel wees aangesien selgs een voorbeeld benodig word vir die inskrywing van oefeninge. Ons het die GCN-model as ’n kenmerk-uittreksel model gebruik om ’n maatstaf te leer om soortgelyke aksies nader aan mekaar te projekteer in ’n inbeddingspasie en ongelyksoortige aksies verder uitmekaar te projekteer. Ons model het ’n klassifikasie-akkuraatheid van 87.4% behaal op die sewe oefenklasse wat nog nooit voorheen gesien is nie.
Description
Thesis (MEng)--Stellenbosch University, 2023.
Keywords
Citation