Multitask learning and data distribution search in visual relationship recognition

dc.contributor.advisorBrink, Willieen_ZA
dc.contributor.authorJosias, Shaneen_ZA
dc.contributor.otherStellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics).en_ZA
dc.date.accessioned2020-02-19T13:20:04Z
dc.date.accessioned2020-04-28T12:19:42Z
dc.date.available2020-02-19T13:20:04Z
dc.date.available2020-04-28T12:19:42Z
dc.date.issued2020-03
dc.descriptionThesis (MSc)--Stellenbosch University, 2020.en_ZA
dc.description.abstractENGLISH ABSTRACT: An image can be described by the objects within it, as well as the interactions between those objects. A pair of object labels together with an interaction label can be assembled into what is known as a visual relationship, represented as a triplet of the form (subject, predicate, object). Recognising visual relationships in a given image is a challenging task, owing to the combinatorially large number of possible relationship triplets which lead to a so-called extreme classification problem, as well as a very long tail found typically in the distribution of those possible triplets. We investigate the efficacy of four strategies that could potentially address these issues. Firstly, instead of predicting the full triplet we opt to predict each element separately. Secondly, we investigate the use of shared network parameters to perform these separate predictions in a basic multitask setting. Thirdly, we extend the multitask setting by including an online ranking loss that acts on a trio of samples (an anchor, a positive sample, and a negative sample). Semi-hard negative mining is used to select negative samples. Finally, we consider a class-selective batch construction strategy to expose the network to more of the many rare classes during mini-batch training. We view semihard negative mining and class-selective batch construction as training data distribution search, in the sense that they both attempt to carefully select training samples in order to improve model performance. In addition to the aforementioned strategies, we also introduce a means of evaluating model behaviour in visual relationship recognition. This evaluation motivates the use of semantics. Our experiments demonstrate that batch construction can improve performance on the long tail, possibly at the expense of accuracy on the small number of dominating classes. We also find that a basic multitask model neither improves nor impedes performance in any significant way, but that its smaller size may be beneficial. Moreover, multitask models trained with a ranking loss yield a decrease in performance, possibly due to limited batch sizes.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: ’n Beeld kan beskryf word deur die voorwerpe daarin, asook die interaksies tussen daardie voorwerpe. Twee voorwerpetikette saammet ’n interaksie-etiket staan bekend as ’n visuele verwantskap, en word voorgestel met ’n drieling van die vorm (onderwerp, predikaat, voorwerp). Die herkenning van visuele verwantskappe in ’n gegewe beeld is ’n uitdagende taak, te danke aan die kombinatoriese groot aantal moontlike verwantskap-drielinge, wat lei tot ’n sogenaamde ekstreme klassifikasieprobleem, sowel as ’n baie lang stert wat tipies in die verspreiding van daardie moontlike drielinge voorkom. Ons ondersoek die doeltreffendheid van vier strategieë om hierdie probleme aan te pak. Eerstens, in plaas daarvan om die volledige drieling te voorspel, kies ons om elke element afsonderlik te voorspel. Tweedens ondersoek ons die gebruik van gedeelde netwerkparameters om hierdie afsonderlike voorspellings in ’n basiese multitaak-opstelling uit te voer. Derdens brei ons die multitaak-opstelling uit deur ’n aanlyn rang-verliesfunksie in te sluit, gedefinieër op ’n trio van datapunte (’n anker, ’n positiewe voorbeeld en ’n negatiewe voorbeeld). Semi-moeilike negatiewe ontginning word gebruik om negatiewe voorbeelde te selekteer. Laastens word daar gekyk na ’n klas-selektiewe bondelkonstruksie-strategie om die netwerk bloot te stel aan meer van die seldsame klasse tydens mini-bondel afrigting. Ons beskou semi-moeilike negatiewe ontginning en klas-selektiewe bondelkonstruksie as vorme van ’n dataverspreidings-soektog. Albei poog om afrig-datapunte noukeurig te kies om die model se prestasie te verbeter. Benewens die bogenoemde strategieë, stel ons ook ’n manier voor om modelgedrag in die herkenning van visuele verwantskappe te evalueer. Hierdie evaluering motiveer die gebruik van semantiek. Ons eksperimente demonstreer dat bondelkonstruksie prestasie op die lang stert kan verbeter, moontlik ten koste van akkuraatheid op die klein aantal dominante klasse. Ons vind ook dat ’n basiese multitaakmodel nie die prestasie op ’n beduidende manier verbeter of belemmer nie, maar dat die kleiner modelgrootte daarvan voordelig kan wees. Boonop lei multitaakmodelle wat met ’n rang-verliesfunksie afgerig word, tot ’n laer prestasie, moontlik as gevolg van beperkte bondelgroottes.af_ZA
dc.description.versionMasters
dc.format.extentvi, 60 pages : illustrationsen_ZA
dc.identifier.urihttp://hdl.handle.net/10019.1/108109
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch University.en_ZA
dc.rights.holderStellenbosch University.en_ZA
dc.subjectMachine learningen_ZA
dc.subjectNeural networks (Computer science)en_ZA
dc.subjectComputer visionen_ZA
dc.subjectComputer multitaskingen_ZA
dc.subjectVisual relationship recognitionen_ZA
dc.subjectElectronic data processing -- Batch processingen_ZA
dc.subjectUCTD
dc.titleMultitask learning and data distribution search in visual relationship recognitionen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
josias_multitask_2020.pdf
Size:
4.02 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: