Variable selection for kernel methods with application to binary classification
Date
2008-03
Authors
Oosthuizen, Surette
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : University of Stellenbosch
Abstract
The problem of variable selection in binary kernel classification is addressed in this thesis.
Kernel methods are fairly recent additions to the statistical toolbox, having originated
approximately two decades ago in machine learning and artificial intelligence. These
methods are growing in popularity and are already frequently applied in regression and
classification problems.
Variable selection is an important step in many statistical applications. Thereby a better
understanding of the problem being investigated is achieved, and subsequent analyses of
the data frequently yield more accurate results if irrelevant variables have been eliminated.
It is therefore obviously important to investigate aspects of variable selection for kernel
methods.
Chapter 2 of the thesis is an introduction to the main part presented in Chapters 3 to 6. In
Chapter 2 some general background material on kernel methods is firstly provided, along
with an introduction to variable selection. Empirical evidence is presented substantiating
the claim that variable selection is a worthwhile enterprise in kernel classification
problems. Several aspects which complicate variable selection in kernel methods are
discussed.
An important property of kernel methods is that the original data are effectively
transformed before a classification algorithm is applied to it. The space in which the
original data reside is called input space, while the transformed data occupy part of a
feature space. In Chapter 3 we investigate whether variable selection should be performed
in input space or rather in feature space. A new approach to selection, so-called feature-toinput
space selection, is also proposed. This approach has the attractive property of
combining information generated in feature space with easy interpretation in input space. An empirical study reveals that effective variable selection requires utilisation of at least
some information from feature space.
Having confirmed in Chapter 3 that variable selection should preferably be done in feature
space, the focus in Chapter 4 is on two classes of selecion criteria operating in feature
space: criteria which are independent of the specific kernel classification algorithm and
criteria which depend on this algorithm. In this regard we concentrate on two kernel
classifiers, viz. support vector machines and kernel Fisher discriminant analysis, both of
which are described in some detail in Chapter 4. The chapter closes with a simulation
study showing that two of the algorithm-independent criteria are very competitive with the
more sophisticated algorithm-dependent ones.
In Chapter 5 we incorporate a specific strategy for searching through the space of variable
subsets into our investigation. Evidence in the literature strongly suggests that backward
elimination is preferable to forward selection in this regard, and we therefore focus on
recursive feature elimination. Zero- and first-order forms of the new selection criteria
proposed earlier in the thesis are presented for use in recursive feature elimination and their
properties are investigated in a numerical study. It is found that some of the simpler zeroorder
criteria perform better than the more complicated first-order ones.
Up to the end of Chapter 5 it is assumed that the number of variables to select is known.
We do away with this restriction in Chapter 6 and propose a simple criterion which uses the
data to identify this number when a support vector machine is used. The proposed criterion
is investigated in a simulation study and compared to cross-validation, which can also be
used for this purpose. We find that the proposed criterion performs well.
The thesis concludes in Chapter 7 with a summary and several discussions for further
research.
Description
Thesis (PhD (Statistics and Actuarial Science))—University of Stellenbosch, 2008.
Keywords
Variable selection, Support vector machines, Kernel Fisher discriminant analysis, Dissertations -- Statistics and actuarial science