Browsing by Author "Josias, Shane"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- ItemMultitask learning and data distribution search in visual relationship recognition(Stellenbosch : Stellenbosch University., 2020-03) Josias, Shane; Brink, Willie; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Applied Mathematics).ENGLISH ABSTRACT: An image can be described by the objects within it, as well as the interactions between those objects. A pair of object labels together with an interaction label can be assembled into what is known as a visual relationship, represented as a triplet of the form (subject, predicate, object). Recognising visual relationships in a given image is a challenging task, owing to the combinatorially large number of possible relationship triplets which lead to a so-called extreme classification problem, as well as a very long tail found typically in the distribution of those possible triplets. We investigate the efficacy of four strategies that could potentially address these issues. Firstly, instead of predicting the full triplet we opt to predict each element separately. Secondly, we investigate the use of shared network parameters to perform these separate predictions in a basic multitask setting. Thirdly, we extend the multitask setting by including an online ranking loss that acts on a trio of samples (an anchor, a positive sample, and a negative sample). Semi-hard negative mining is used to select negative samples. Finally, we consider a class-selective batch construction strategy to expose the network to more of the many rare classes during mini-batch training. We view semihard negative mining and class-selective batch construction as training data distribution search, in the sense that they both attempt to carefully select training samples in order to improve model performance. In addition to the aforementioned strategies, we also introduce a means of evaluating model behaviour in visual relationship recognition. This evaluation motivates the use of semantics. Our experiments demonstrate that batch construction can improve performance on the long tail, possibly at the expense of accuracy on the small number of dominating classes. We also find that a basic multitask model neither improves nor impedes performance in any significant way, but that its smaller size may be beneficial. Moreover, multitask models trained with a ranking loss yield a decrease in performance, possibly due to limited batch sizes.
- ItemReliable likelihoods for out-of-distribution data from continuous-time normalising flows(Stellenbosch University, 2024-12) Josias, Shane; Brink, Willie; Stellenbosch University. Faculty of Science. Dept. of Applied Mathematics.A continuous-time normalising flow is a deep generative model that allows for exact likelihood evaluation by defining a transformation between data and samples from a base distribution. The transformation is implicitly defined as a solution to a neural ordinary differential equation (neural ODE), and requires solution trajectories to be simulated by an ODE solver. This formulation eases invertibility, avoids the expensive determinant calculation in discrete-step normalising flows, and removes constraints on the neural network architecture underlying the transformation process. We examine two problems related to continuous-time normalising flows, focusing on their application as a generative model for image data. The first is the computational bottleneck in the simulation of solution trajectories, which can lead to long training times. The second relates to the reported phenomenon that normalising flow models assign higher likelihoods to out-of-distribution samples, than they do to in-distribution samples. For the first problem, we explore whether regularising the Jacobian of the neural ODE during training can improve computational efficiency. Our results indicate that Jacobian regularisation can reduce the number of function evaluations required by an ODE solver when computing solution trajectories, and can offer additional benefits such as robustness, and distance to decision boundaries for a classification problem. However, we argue that these benefits do not outweigh the time-cost of simulating solution trajectories and turn to the use of the conditional flow matching objective in continuous-time normalising flow training, as it circumvents the need to simulate solution trajectories. Models trained with this objective are called CFM models. For the second problem, we show that CFM models also assign higher likelihoods to out-ofdistribution data. We then explore whether multimodality in the base distribution can improve matters. The multimodal base distribution allows for class conditional sampling, but can lead to mode collapse in terms of its sampling ability and does not lead to reliable likelihoods on out-of-distribution data.We also show that these CFM models tend to fit to pixel content rather than semantic content, corroborating observations from the literature for discrete-step flows. Motivated by this realisation, we instead train CFM models on image feature representations obtained from a pretrained classifier, a pretrained autoencoder, and another autoencoder trained from scratch.We find that feature representations which do not contain image-specific structure can lead to reliable likelihoods from CFM models on out-of-distribution data. We do find that CFM models trained on our proposed feature representations generate samples of a lower quality, and suggest avenues for future work.