Innovative genetic algorithms for chemoinformatics

被引:20
作者
Lavine, BK [1 ]
Davidson, CE [1 ]
Moores, AJ [1 ]
机构
[1] Clarkson Univ, Dept Chem, Potsdam, NY 13699 USA
关键词
genetic algorithms; chemoinformatics; perceptron;
D O I
10.1016/S0169-7439(01)00193-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we report on the development of a genetic algorithm (GA) for pattern recognition analysis of multivariate chemical data. The GA identifies feature subsets that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because principal components maximize variance, the bulk of the information encoded by the selected features is about differences between classes in the data set. The principal component (PC) plot function as embedded information filter. Sets of features are selected based on their principal component plots, with a good principal component plot generated by features whose variance or information is primarily about differences between classes in the data set. This limits the GA to search for these types of feature subsets, significantly reducing the size of the search space. In addition, the pattern recognition GA focuses on those classes and/or samples that are difficult to classify by boosting their weights over successive generation using a perceptron to team the class and sample weights. Samples that consistently classify correctly are not as heavily weighted in the analysis as samples that are difficult to classify. The pattern recognition GA integrates aspects of artificial intelligence and evolutionary computations to yield a "smart" one-pass procedure for feature selection. The efficacy and efficiency of the pattern recognition GA is demonstrated via problems: from chemical communication and environmental analysis. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:161 / 171
页数:11
相关论文
共 15 条
[1]   Post-consumer plastic identification using Raman spectroscopy [J].
Allen, V ;
Kalivas, JH ;
Rodriguez, RG .
APPLIED SPECTROSCOPY, 1999, 53 (06) :672-681
[2]  
[Anonymous], 1989, GENETIC ALGORITHM SE
[3]  
BEECHER MD, 1982, AM ZOOL, V22, P477
[4]  
BRERTON RG, 1992, MULTIVARIATE PATTERN, P1
[5]   NESTMATE AND KIN RECOGNITION IN INTERSPECIFIC MIXED COLONIES OF ANTS [J].
CARLIN, NF ;
HOLLDOBLER, B .
SCIENCE, 1983, 222 (4627) :1027-1029
[6]  
Fukunaga K., 1990, STAT PATTERN RECOGNI
[7]  
Jackson JE, 1991, A user's guide to principal components
[8]  
James M., 1985, CLASSIFICATION ALGOR
[9]   Source identification of underground fuel spills by solid-phase microextraction/high-resolution gas chromatography/genetic algorithms [J].
Lavine, BK ;
Ritter, J ;
Moores, AJ ;
Wilson, M ;
Faruque, A ;
Mayfield, HT .
ANALYTICAL CHEMISTRY, 2000, 72 (02) :423-431
[10]   Genetic algorithms applied to pattern recognition analysis of high-speed gas chromatograms of aviation turbine fuels using an integrated Jet-A/JP-8 database [J].
Lavine, BK ;
Moores, AJ ;
Mayfield, H ;
Faruque, A .
MICROCHEMICAL JOURNAL, 1999, 61 (01) :69-78