Comparative study of class data analysis with PCA-LDA, SIMCA, PLS, ANNs, and k-NN

被引:86
作者
Tominaga, Y [1 ]
机构
[1] Dainippon Pharmaceut Co Ltd, Discovery Res Labs, Dept Chem 1, Suita, Osaka 5640053, Japan
关键词
PCA-LDA (principal component analysis-linear discriminant analysis); SIMCA (soft independent modeling by class analogy); PLS2 (partial least-squares2); artificial neural networks; k-nearest neighbor method;
D O I
10.1016/S0169-7439(99)00034-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Three types of chemotherapeutic agents, antibacterials, antineoplastics, and antifungals, which are registered in the MDL drug data report (MDDR) database, were used as training data set, and the classification study was performed using the following seven methods: principal component analysis-linear discriminant analysis (PCA-LDA), soft independent modeling by class analogy (SIMCA), partial least-squares2 (PLS2), artificial neural networks (ANNs), nearest neighbor method (NN), combined method of Ward clustering and NN (W-NN), and combined method of genetic algorithms (GAs) and NN (GA-NN). The number of correctly classified samples for each method was decreased by the following order: NN, ANNs, GA-NN, SIMCA, PLS2, W-NN, and PCA-LDA. Using these models, prediction study was then performed for the test set which consists of the drugs registered in the comprehensive medicinal chemistry (CMC) database. The number of correctly predicted samples for each method was decreased by the following order: NN, GA-NN, W-NN, SIMCA, PCA-LDA, ANNs, and PLS2. NN gave the best model from view points of the classification and prediction while overfitting was observed in ANNs and PLS2. Although the fitness and predictiveness of GA-NN and W-NN were inferior to those of NN, the predictiveness of the two methods were superior to PCA-LDA, SIMCA, ANNs, and PLS2. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:105 / 115
页数:11
相关论文
共 20 条
[11]   Novel 3D descriptions using excluded volume 2: Application to drug classification [J].
Tominaga, Y .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06) :1157-1160
[12]   Data structure comparison using box counting analysis [J].
Tominaga, Y .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (05) :867-875
[13]   CLASSIFICATION OF MASS-SPECTRA - A COMPARISON OF YES/NO CLASSIFICATION METHODS FOR THE RECOGNITION OF SIMPLE STRUCTURAL-PROPERTIES [J].
WERTHER, W ;
LOHNINGER, H ;
VARMUZA, K ;
STANCL, F .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1994, 22 (01) :63-76
[14]   PATTERN-RECOGNITION BY MEANS OF DISJOINT PRINCIPAL COMPONENTS MODELS [J].
WOLD, S .
PATTERN RECOGNITION, 1976, 8 (03) :127-139
[15]   THE COLLINEARITY PROBLEM IN LINEAR-REGRESSION - THE PARTIAL LEAST-SQUARES (PLS) APPROACH TO GENERALIZED INVERSES [J].
WOLD, S ;
RUHE, A ;
WOLD, H ;
DUNN, WJ .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1984, 5 (03) :735-743
[16]   MULTIVARIATE QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS (QSAR) - CONDITIONS FOR THEIR APPLICABILITY [J].
WOLD, S ;
DUNN, WJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1983, 23 (01) :6-13
[17]  
WOLD S, 1981, ANAL CHIM ACTA, V133, P251
[18]   Comparison of regularized discriminant analysis, linear discriminant analysis and quadratic discriminant analysis, applied to NIR data [J].
Wu, W ;
Mallet, Y ;
Walczak, B ;
Penninckx, W ;
Massart, DL ;
Heuerding, S ;
Erni, F .
ANALYTICA CHIMICA ACTA, 1996, 329 (03) :257-265
[19]   The kernel PCA algorithms for wide data .1. Theory and algorithms [J].
Wu, W ;
Massart, DL ;
deJong, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1997, 36 (02) :165-172
[20]   NEURAL NETWORKS - A NEW METHOD FOR SOLVING CHEMICAL PROBLEMS OR JUST A PASSING PHASE [J].
ZUPAN, J ;
GASTEIGER, J .
ANALYTICA CHIMICA ACTA, 1991, 248 (01) :1-30