Browsing by Author "Louw, Nelmarie"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemAspects of the pre- and post-selection classification performance of discriminant analysis and logistic regression(Stellenbosch : Stellenbosch University, 1997-12) Louw, Nelmarie; Le Roux, N. J.; Steel, S. J.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences.ENGLISH ABSTRACT: Discriminani analysis and logistic regression are techniques that can be used to classify entities of unknown origin into one of a number of groups. However, the underlying models and assumptions for application of the two techniques differ. In this study, the two techniques are compared with respect to classification of entities. Firstly, the two techniques were compared in situations where no data dependent variable selection took place. Several underlying distributions were studied: the normal distribution, the double exponential distribution and the lognormal distribution. The number of variables, sample sizes from the different groups and the correlation structure between the variables were varied to' obtain a large number of different configurations. .The cases of two and three groups were studied. The most important conclusions are: "for normal and double' exponential data linear discriminant analysis outperforms logistic regression, especially in cases where the ratio of the number of variables to the total sample size is large. For lognormal data, logistic regression should be preferred, except in cases where the ratio of the number of variables to the total sample size is large. " Variable selection is frequently the first step in statistical analyses. A large number of potenti8.Ily important variables are observed, and an optimal subset has to be selected for use in further analyses. Despite the fact that variable selection is often used, the influence of a selection step on further analyses of the same data, is often completely ignored. An important aim of this study was to develop new selection techniques for use in discriminant analysis and logistic regression. New estimators of the postselection error rate were also developed. A new selection technique, cross model validation (CMV) that can be applied both in discriminant analysis and logistic regression, was developed. ."This technique combines the selection of variables and the estimation of the post-selection error rate. It provides a method to determine the optimal model dimension, to select the variables for the final model and to estimate the post-selection error rate of the discriminant rule. An extensive Monte Carlo simulation study comparing the CMV technique to existing procedures in the literature, was undertaken. In general, this technique outperformed the other methods, especially with respect to the accuracy of estimating the post-selection error rate. Finally, pre-test type variable selection was considered. A pre-test estimation procedure was adapted for use as selection technique in linear discriminant analysis. In a simulation study, this technique was compared to CMV, and was found to perform well, especially with respect to correct selection. However, this technique is only valid for uncorrelated normal variables, and its applicability is therefore limited. A numerically intensive approach was used throughout the study, since the problems that were investigated are not amenable to an analytical approach.