Dimensionality reduction using genetic algorithms

被引:551
作者
Raymer, ML [1 ]
Punch, WE
Goodman, ED
Kuhn, LA
Jain, AK
机构
[1] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[2] Michigan State Univ, Case Ctr Comp Aided Engn & Mfg, E Lansing, MI 48824 USA
[3] Michigan State Univ, Dept Biochem, E Lansing, MI 48824 USA
基金
美国国家科学基金会;
关键词
curse of dimensionality; feature extraction; feature selection; genetic algorithms; pattern classification;
D O I
10.1109/4235.850656
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern have a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and allow higher classification accuracy, Many current feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and increasing classification efficiency, it does not necessarily reduce the number of features that must be measured since each new feature may be a linear combination of all of the features in the original pattern vector, Here, we present a new approach to feature extraction in which feature selection, feature extraction, and classifier training are performed simultaneously using a genetic algorithm, The genetic algorithm optimizes a vector of feature weights, which are used to scale the individual features in the original pattern vectors in either a linear or a nonlinear fashion. A masking vector is also employed to perform simultaneous selection of a subset of the features, We employ this technique in combination with the k nearest neighbor classification rule, and compare the results with classical feature selection and extraction techniques including sequential floating forward feature selection, and linear discriminant analysis. We also present results for the identification of favorable water-binding sites on protein surfaces, an important problem in biochemistry and drug design.
引用
收藏
页码:164 / 171
页数:8
相关论文
共 41 条
[1]  
ABOLA EE, 1987, PROTEIN DATA BANK CR, P107
[2]  
[Anonymous], 1989, GENETIC ALGORITHM SE
[3]  
[Anonymous], 1966, NATURAL AUTOMATA USE
[4]  
[Anonymous], 1998, UCI REPOSITORY MACHI
[5]  
[Anonymous], P 11 INT JOINT C ART
[6]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) :535-542
[7]   Pattern recognition using discriminative feature extraction [J].
Biem, A ;
Katagiri, S ;
Juang, BH .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (02) :500-504
[8]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[9]   The role of structure in antibody cross-reactivity between peptides and folded proteins [J].
Craig, L ;
Sanschagrin, PC ;
Rozek, A ;
Lackie, S ;
Kuhn, LA ;
Scott, JK .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (01) :183-201
[10]  
CROSBY JL, 1967, SCI PROG, V55, P279