Initialisation of noise-regularised neural networks
Date
2021-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Recently, proper initialisation and stochastic regularisation techniques have
greatly improved the performance and ease of training of neural networks.
Some research has gone into how the magnitude of the initial weights impact
optimisation, while others have focused on how initialisation affects signal
propagation. In terms of noise regularisation, dropout has allowed networks
to train relatively quickly and reduced overfitting. Much research has gone
towards understanding why dropout improves the generalisation of networks.
Two major theories are (i) that it prevents neurons from becoming too dependent
on the output of other neurons and (ii) that dropout leads a network to
optimise a smoother loss landscape.
Despite this, our theoretical understanding of the interaction between regularisation
and initialisation is sparse. Thus, the aim of this work was to
broaden our knowledge of how initialisation and stochastic regularisation interact
and what impact this has on network training and performance. Because
rectifier activation functions are widely used, we extended new network
signal propagation theory to rectifier networks that may use stochastic regularisation.
Our theory predicted a critical initialisation that allows for stable
pre-activation variance signal propagation. However, our theory also indicated
that stochastic regularisation reduces the depth to which correlation information
can propagate in ReLU networks. We validated this theory and showed
that it accurately predicts a boundary across which networks do not train
effectively.
We then extended the investigation by conducting a large-scale randomised
control trial to search for initialisations in a region that conserves input signal
around the critical initialisation in the hopes of finding initialisations that provide advantages to training or generalisation. We compare the critical initialisation
to 10 other initialisation schemes in a trial that consisted of over
12000 networks. We found that initialisations much larger than the critical initialisation
provide extremely poor performance, while network initialisations
close to the critical initialisation provide similar performance. No initialisations
clearly outperformed the critical initialisation. Thus, we recommend it
as a safe default for practitioners.
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar.
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar.
Description
Thesis (MSc)--Stellenbosch University, 2021.
Keywords
Deep learning (Machine learning), Neural networks (Computer science) -- Noise, Stochastic regularisation, Critical initialisation, Signal propagation, Neural network initialisation, Noise (Computer science), UCTD