Modern gradient boosting

Date
2024-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH SUMMARY: Boosting is a supervised learning procedure that has gained considerable interest in statistical and machine learning owing to its powerful predictive performance. The idea of boosting is to obtain a model ensemble by sequentially fitting base learners to modified versions of the training data. The first complete boosting procedure was Adaptive boosting (AdaBoost), designed for binary classification. Gradient boosting followed AdaBoost, which allowed boosting to be applied to any differentiable and continuous loss function. The most frequently used version of gradient boosting is Multiple Additive Regression Trees (MART), where trees are specified as the base learners. In the last several years, there have been numerous extensions to MART, aiming to improve its predictive performance and scalability. Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and Categorical Boosting (CatBoost) are three of these extensions, which in this thesis are termed the modern gradient boosting methods. The thesis introduces boosting by reviewing the details of AdaBoost, forward stagewise additive modelling (FSAM) and gradient boosting. Notably, the equivalence of AdaBoost and FSAM with the exponential loss is proven, FSAM for regression with trees is considered and the need for an efficient procedure like gradient boosting is emphasised. Additionally, two derivations of gradient boosting are provided. The first considers gradient boosting as an approximation to steepest descent of the empirical risk, while the second views gradient boosting as taking a quadratic approximation of FSAM. Since trees are a popular choice of base learner in gradient boosting, details will be given on MART. The remainder of the thesis studies the modern methods, focusing on the mathematical details of their novelties. Examples, illustrations, and simulations are given for some of these novelties to provide further clarity. Additionally, empirical studies investigating the generalisation performance of certain novelties are presented. More specifically, these empirical studies consider the performance of XGBoost’s regularisation parameters in tree-building, GOSS from LightGBM, the Plain and Ordered modes in CatBoost, and the cosine similarity to construct the trees in CatBoost. In these experiments, several binary classification datasets are considered with varying characteristics: size, class imbalance, sparsity and the inclusion of categorical features.
AFRIKAANSE OPSOMMING: Versterking is ’n leer-onder-toesig prosedure met kragtige voorspellingsvermoens wat baie in statistieseen masjienleer gebruik word. Die idee van versterking is om basisleerders opeenvolgend op aangepaste weergawes van die leer-data te pas, om sodoende saamgevoegde model te ontwikkel. AdaBoost was die eerste volledige versterkingsprosedure vir binere klassifikasie. Gradientversterking, wat dit moontlik maak om versterking op enige differensieerbare en kontinue verliesfunksie toe te pas, het op AdaBoost gevolg. Die weergawe van gradientversterking wat die meeste gebruik word, is MART; hier word regressie bome as die basisleerders gebruik. In die afgelope paar jaar is verskeie uitbreidings van MART ontwikkel; die doel van hierdie uitbreidings was om die voorspellingsvermoe van modelle te verbeter. Drie van hierdie uitbreidings is XGBoost, LightGBM en CatBoost, en staan in hierdie tesis as moderne gradientversterkingsmetodes bekend. Die tesis lei versterking in deur die besonderhede van AdaBoost, FSAM en gradientversterking te hersien. Daar word bewys dat AdaBoost en FSAM ekwivalent is in die geval van eksponensiele verlies; FSAM vir regressie met bome word beskou, en die behoefte om doeltreffende prosedure, soos gradientversterking, te ontwikkel, word beklemtoon. Twee afleidings van gradientversterking word ook gegee. Die eerste beskou gradientversterking as ’n benadering tot die steilste afname van die empiriese risiko, terwyl die tweede gradientversterking as ’n kwadratiese benadering van FSAM beskou. Aangesien bome ’n gewilde keuse vir basisleerders in gradientversterking is, word besonderhede binne die konteks van MART gegee. Die res van die tesis bestudeer die moderne gradientversterkingsmetodes met die fokus op nuwe wiskundige besonderhede. Om verdere duidelikheid te gee word voorbeelde, illustrasies en simulasies van sommige van hierdie nuwe wiskundige besonderhede gegee. Empiriese studies wat die veralgemeende prestasie van die metodes ondersoek, word ook gegee. In besonder beskou die empiriese studies die prestasie van XGBoost se regulariseringsparameters wanneer bome gebou word, GOSS in LightGBM, die gewone en geordende modusse in CatBoost, en die kosinus-ooreenkoms om bome in CatBoost te bou. In hierdie eksperimente word verskeie binere klassifikasiedatastelle gebruik met verskillende eienskappe soos grootte, klas-wanbalans, ylheid en die insluiting van kategoriese veranderlikes.
Description
Thesis (MCom)--Stellenbosch University, 2024.
Keywords
Citation