dc.contributor.advisor | Brink, Willie | en_ZA |
dc.contributor.author | Josias, Shane | en_ZA |
dc.contributor.other | Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics). | en_ZA |
dc.date.accessioned | 2020-02-19T13:20:04Z | |
dc.date.accessioned | 2020-04-28T12:19:42Z | |
dc.date.available | 2020-02-19T13:20:04Z | |
dc.date.available | 2020-04-28T12:19:42Z | |
dc.date.issued | 2020-03 | |
dc.identifier.uri | http://hdl.handle.net/10019.1/108109 | |
dc.description | Thesis (MSc)--Stellenbosch University, 2020. | en_ZA |
dc.description.abstract | ENGLISH ABSTRACT: An image can be described by the objects within it, as well as the interactions between
those objects. A pair of object labels together with an interaction label can be assembled
into what is known as a visual relationship, represented as a triplet of the form (subject,
predicate, object). Recognising visual relationships in a given image is a challenging task,
owing to the combinatorially large number of possible relationship triplets which lead to
a so-called extreme classification problem, as well as a very long tail found typically in
the distribution of those possible triplets.
We investigate the efficacy of four strategies that could potentially address these issues.
Firstly, instead of predicting the full triplet we opt to predict each element separately.
Secondly, we investigate the use of shared network parameters to perform these separate
predictions in a basic multitask setting. Thirdly, we extend the multitask setting by
including an online ranking loss that acts on a trio of samples (an anchor, a positive
sample, and a negative sample). Semi-hard negative mining is used to select negative
samples. Finally, we consider a class-selective batch construction strategy to expose the
network to more of the many rare classes during mini-batch training. We view semihard
negative mining and class-selective batch construction as training data distribution
search, in the sense that they both attempt to carefully select training samples in order
to improve model performance. In addition to the aforementioned strategies, we also
introduce a means of evaluating model behaviour in visual relationship recognition. This
evaluation motivates the use of semantics.
Our experiments demonstrate that batch construction can improve performance on the
long tail, possibly at the expense of accuracy on the small number of dominating classes.
We also find that a basic multitask model neither improves nor impedes performance
in any significant way, but that its smaller size may be beneficial. Moreover, multitask
models trained with a ranking loss yield a decrease in performance, possibly due to
limited batch sizes. | en_ZA |
dc.description.abstract | AFRIKAANSE OPSOMMING: ’n Beeld kan beskryf word deur die voorwerpe daarin, asook die interaksies tussen daardie
voorwerpe. Twee voorwerpetikette saammet ’n interaksie-etiket staan bekend as ’n visuele
verwantskap, en word voorgestel met ’n drieling van die vorm (onderwerp, predikaat,
voorwerp). Die herkenning van visuele verwantskappe in ’n gegewe beeld is ’n uitdagende
taak, te danke aan die kombinatoriese groot aantal moontlike verwantskap-drielinge, wat
lei tot ’n sogenaamde ekstreme klassifikasieprobleem, sowel as ’n baie lang stert wat tipies
in die verspreiding van daardie moontlike drielinge voorkom.
Ons ondersoek die doeltreffendheid van vier strategieë om hierdie probleme aan te pak.
Eerstens, in plaas daarvan om die volledige drieling te voorspel, kies ons om elke element
afsonderlik te voorspel. Tweedens ondersoek ons die gebruik van gedeelde netwerkparameters
om hierdie afsonderlike voorspellings in ’n basiese multitaak-opstelling uit te voer.
Derdens brei ons die multitaak-opstelling uit deur ’n aanlyn rang-verliesfunksie in te sluit,
gedefinieër op ’n trio van datapunte (’n anker, ’n positiewe voorbeeld en ’n negatiewe
voorbeeld). Semi-moeilike negatiewe ontginning word gebruik om negatiewe voorbeelde
te selekteer. Laastens word daar gekyk na ’n klas-selektiewe bondelkonstruksie-strategie
om die netwerk bloot te stel aan meer van die seldsame klasse tydens mini-bondel afrigting.
Ons beskou semi-moeilike negatiewe ontginning en klas-selektiewe bondelkonstruksie
as vorme van ’n dataverspreidings-soektog. Albei poog om afrig-datapunte noukeurig te
kies om die model se prestasie te verbeter. Benewens die bogenoemde strategieë, stel
ons ook ’n manier voor om modelgedrag in die herkenning van visuele verwantskappe te
evalueer. Hierdie evaluering motiveer die gebruik van semantiek.
Ons eksperimente demonstreer dat bondelkonstruksie prestasie op die lang stert kan
verbeter, moontlik ten koste van akkuraatheid op die klein aantal dominante klasse. Ons
vind ook dat ’n basiese multitaakmodel nie die prestasie op ’n beduidende manier verbeter
of belemmer nie, maar dat die kleiner modelgrootte daarvan voordelig kan wees. Boonop
lei multitaakmodelle wat met ’n rang-verliesfunksie afgerig word, tot ’n laer prestasie,
moontlik as gevolg van beperkte bondelgroottes. | af_ZA |
dc.format.extent | vi, 60 pages : illustrations | en_ZA |
dc.language.iso | en_ZA | en_ZA |
dc.publisher | Stellenbosch : Stellenbosch University. | en_ZA |
dc.subject | Machine learning | en_ZA |
dc.subject | Neural networks (Computer science) | en_ZA |
dc.subject | Computer vision | en_ZA |
dc.subject | Computer multitasking | en_ZA |
dc.subject | Visual relationship recognition | en_ZA |
dc.subject | Electronic data processing -- Batch processing | en_ZA |
dc.subject | UCTD | |
dc.title | Multitask learning and data distribution search in visual relationship recognition | en_ZA |
dc.type | Thesis | en_ZA |
dc.description.version | Masters | |
dc.rights.holder | Stellenbosch University. | en_ZA |