From projection pursuit and CART to adaptive discriminant analysis?

被引:11
作者
Gribonval, R [1 ]
机构
[1] IRISA, INRIA, French Natl Ctr Comp Sci & Control, F-35042 Rennes, France
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2005年 / 16卷 / 03期
基金
美国国家科学基金会;
关键词
classification and regression trees (CART); classification tree; discriminant analysis; mutual information; nonlinear approximation; projection pursuit; sequential testing;
D O I
10.1109/TNN.2005.844900
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While many efforts have been put into the development of nonlinear approximation theory and its applications to signal and image compression, encoding and denoising, there seems to be very few theoretical developments of adaptive discriminant representations in the area of feature extraction, selection and signal classification. In this paper, we try to advocate the idea that such developments and efforts are worthwhile, based on the theorerical study of a data-driven discriminant analysis method on a simple-yet instructive-example. We consider the problem of classifying a signal drawn from a mixture of two classes, using its projections onto low-dimensional subspaces. Unlike the linear discriminant analysis (LDA) strategy, which selects subspaces that do not depend on the observed signal, we consider an adaptive sequential selection of projections, in the spirit of nonlinear approximation and classification and regression trees (CART): at each step, the subspace is enlarged in a direction that maximizes the mutual information with the unknown class. We derive explicit characterizations of this adaptive discriminant analysis (ADA) strategy in two situations. When the two classes are Gaussian with the same covariance matrix but different means, the adaptive subspaces are actually nonadaptive and can be computed with an algorithm similar to orthonormal matching pursuit. When the classes are centered Gaussians with different covariances, the adaptive subspaces are spanned by eigen-vectors of an operator given by the covariance matrices (just as could be predicted by regular LDA), however we prove that the order of observation of the components along these eigen-vectors actually depends on the observed signal. Numerical experiments on synthetic data illustrate how data-dependent features can be used to outperform LDA on a classification task, and we discuss how our results could be applied in practice.
引用
收藏
页码:522 / 532
页数:11
相关论文
共 29 条
[1]  
[Anonymous], 1973, PATTERN RECOGNITION
[2]  
BENAROYA L, 2003, P 4 INT S IND COMP A
[3]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[4]   ENTROPY-BASED ALGORITHMS FOR BEST BASIS SELECTION [J].
COIFMAN, RR ;
WICKERHAUSER, MV .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (02) :713-718
[5]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[6]  
DeVore R. A., 1998, Acta Numerica, V7, P51, DOI 10.1017/S0962492900002816
[7]   COMPRESSION OF WAVELET DECOMPOSITIONS [J].
DEVORE, RA ;
JAWERTH, B ;
POPOV, V .
AMERICAN JOURNAL OF MATHEMATICS, 1992, 114 (04) :737-785
[8]  
DONOHO DL, 1994, CR ACAD SCI I-MATH, V319, P1317
[9]   A comparative analysis of methods for pruning decision trees [J].
Esposito, F ;
Malerba, D ;
Semeraro, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (05) :476-491
[10]  
FISHER R. A., 1937, The design of experiments.