Variable selection and model validation of 2D and 3D molecular descriptors

被引:49
作者
Nicholls, A
MacCuish, NE
MacCuish, JD
机构
[1] OpenEye Sci Software Inc, Santa Fe, NM 87507 USA
[2] Mesa Analyt & Comp LLC, Santa Fe, NM 87501 USA
关键词
cluster analysis; electrostatics; field overlaps; hypothesis generation; k-fold cross-validation; molecular shape; QSAR; structural fingerprints; Tversky;
D O I
10.1007/s10822-004-5202-8
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have found that molecular shape and electrostatics, in conjunction with 2D structural fingerprints, are important variables in discriminating classes of active and inactive compounds. The subject of this paper is how to explore the selection of these variables and identify their relative importance in quantitative structure-activity relationships (QSAR) analysis. We show the use of these variables in a form of similarity searching with respect to a crystal structure of a known bound ligand. This analysis is then validated through k-fold cross-validation of enrichments via several common classifiers. Additionally, we show an effective methodology using the variables in hypothesis generation; namely, when the crystal structure of a bound ligand is not known.
引用
收藏
页码:451 / 474
页数:24
相关论文
共 38 条
[1]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[2]   Reproducing the conformations of protein-bound ligands:: A critical evaluation of several popular conformational searching tools [J].
Boström, J .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2001, 15 (12) :1137-1152
[3]   Unsupervised data base clustering based on Daylight's fingerprint and Tanimoto similarity: A fast and automated way to cluster small and large data sets [J].
Butina, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (04) :747-750
[4]   Prospective identification of biologically active structures by topomer shape similarity searching [J].
Cramer, RD ;
Poss, MA ;
Hermsmeier, MA ;
Caulfield, TJ ;
Kowala, MC ;
Valentine, MT .
JOURNAL OF MEDICINAL CHEMISTRY, 1999, 42 (19) :3919-3933
[5]  
*DAYL CIS INC, DAYL THEOR MAN
[6]   Clustering methods and their uses in computational chemistry [J].
Downs, GM ;
Barnard, JM .
REVIEWS IN COMPUTATIONAL CHEMISTRY, VOL 18, 2002, 18 :1-40
[7]   Reoptimization of MDL keys for use in drug discovery [J].
Durant, JL ;
Leland, BA ;
Henry, DR ;
Nourse, JG .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (06) :1273-1280
[8]  
Fischer E., 1894, BER DTSCH CHEM GES, V27, P2985, DOI [DOI 10.1002/CBER.18940270364, 10.1002/cber.18940270364]
[9]  
GASTEIGER J, 1978, TETRAHEDRON LETT, P3181
[10]   UTILIZATION OF GAUSSIAN FUNCTIONS FOR THE RAPID EVALUATION OF MOLECULAR SIMILARITY [J].
GOOD, AC ;
HODGKIN, EE ;
RICHARDS, WG .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (03) :188-191