Reliable likelihoods for out-of-distribution data from continuous-time normalising flows
Date
2024-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch University
Abstract
A continuous-time normalising flow is a deep generative model that allows for exact likelihood evaluation by defining a transformation between data and samples from a base distribution. The transformation is implicitly defined as a solution to a neural ordinary differential equation (neural ODE), and requires solution trajectories to be simulated by an ODE solver. This formulation eases invertibility, avoids the expensive determinant calculation in discrete-step normalising flows, and removes constraints on the neural network architecture underlying the transformation process. We examine two problems related to continuous-time normalising flows, focusing on their application as a generative model for image data. The first is the computational bottleneck in the simulation of solution trajectories, which can lead to long training times. The second relates to the reported phenomenon that normalising flow models assign higher likelihoods to out-of-distribution samples, than they do to in-distribution samples. For the first problem, we explore whether regularising the Jacobian of the neural ODE during training can improve computational efficiency. Our results indicate that Jacobian regularisation can reduce the number of function evaluations required by an ODE solver when computing solution trajectories, and can offer additional benefits such as robustness, and distance to decision boundaries for a classification problem. However, we argue that these benefits do not outweigh the time-cost of simulating solution trajectories and turn to the use of the conditional flow matching objective in continuous-time normalising flow training, as it circumvents the need to simulate solution trajectories. Models trained with this objective are called CFM models. For the second problem, we show that CFM models also assign higher likelihoods to out-ofdistribution data. We then explore whether multimodality in the base distribution can improve matters. The multimodal base distribution allows for class conditional sampling, but can lead to mode collapse in terms of its sampling ability and does not lead to reliable likelihoods on out-of-distribution data.We also show that these CFM models tend to fit to pixel content rather than semantic content, corroborating observations from the literature for discrete-step flows. Motivated by this realisation, we instead train CFM models on image feature representations obtained from a pretrained classifier, a pretrained autoencoder, and another autoencoder trained from scratch.We find that feature representations which do not contain image-specific structure can lead to reliable likelihoods from CFM models on out-of-distribution data. We do find that CFM models trained on our proposed feature representations generate samples of a lower quality, and suggest avenues for future work.