Initialisation of noise-regularised neural networks

Date
2021-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Recently, proper initialisation and stochastic regularisation techniques have greatly improved the performance and ease of training of neural networks. Some research has gone into how the magnitude of the initial weights impact optimisation, while others have focused on how initialisation affects signal propagation. In terms of noise regularisation, dropout has allowed networks to train relatively quickly and reduced overfitting. Much research has gone towards understanding why dropout improves the generalisation of networks. Two major theories are (i) that it prevents neurons from becoming too dependent on the output of other neurons and (ii) that dropout leads a network to optimise a smoother loss landscape. Despite this, our theoretical understanding of the interaction between regularisation and initialisation is sparse. Thus, the aim of this work was to broaden our knowledge of how initialisation and stochastic regularisation interact and what impact this has on network training and performance. Because rectifier activation functions are widely used, we extended new network signal propagation theory to rectifier networks that may use stochastic regularisation. Our theory predicted a critical initialisation that allows for stable pre-activation variance signal propagation. However, our theory also indicated that stochastic regularisation reduces the depth to which correlation information can propagate in ReLU networks. We validated this theory and showed that it accurately predicts a boundary across which networks do not train effectively. We then extended the investigation by conducting a large-scale randomised control trial to search for initialisations in a region that conserves input signal around the critical initialisation in the hopes of finding initialisations that provide advantages to training or generalisation. We compare the critical initialisation to 10 other initialisation schemes in a trial that consisted of over 12000 networks. We found that initialisations much larger than the critical initialisation provide extremely poor performance, while network initialisations close to the critical initialisation provide similar performance. No initialisations clearly outperformed the critical initialisation. Thus, we recommend it as a safe default for practitioners.
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar.
Description
Thesis (MSc)--Stellenbosch University, 2021.
Keywords
Deep learning (Machine learning), Neural networks (Computer science) -- Noise, Stochastic regularisation, Critical initialisation, Signal propagation, Neural network initialisation, Noise (Computer science), UCTD
Citation