SAMPLE-DISTANCE PARTIAL LEAST-SQUARES - PLS OPTIMIZED FOR MANY VARIABLES, WITH APPLICATION TO COMFA

被引:428
作者
BUSH, BL
NACHBAR, RB
机构
[1] Merck Research Laboratories, Building 50SW-100, Merck and Co., Inc., Rahway, 07065, NJ
关键词
PARTIAL LEAST SQUARES; STRUCTURE-ACTIVITY RELATIONSHIP; MOLECULAR MODELING; COMFA; FACTOR ANALYSIS;
D O I
10.1007/BF00124364
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Three-dimensional molecular modeling can provide an unlimited number m of structural properties. Comparative Molecular Field Analysis (CoMFA), for example, may calculate thousands of field values for each model structure. When m is large, partial least squares (PLS) is the statistical method of choice for fitting and predicting biological responses. Yet PLS is usually implemented in a property-based fashion which is optimal only for small m. We describe here a sample-based formulation of PLS which can be used to fit any single response (bioactivity). SAMPLS reduces all explanatory data to the pairwise 'distances' among n samples (molecules), or equivalently to an n-by-n covariance matrix C. This matrix, unmodified, can be used to fit all PLS components. Furthermore, SAMPLS will validate the model by modern resampling techniques, at a cost independent of m. We have implemented SAMPLS as a Fortran program and have reproduced conventional and cross-validated PLS analyses of data from two published studies. Full (leave-each-out) cross-validation of a typical CoMFA takes 0.2 CPU s. SAMPLS is thus ideally suited to structure-activity analysis based on CoMFA fields or bonded topology. The sample-distance formulation also relates PLS to methods like cluster analysis and nonlinear mapping, and shows how drastically PLS simplifies the information in CoMFA fields.
引用
收藏
页码:587 / 619
页数:33
相关论文
共 36 条
[1]  
[Anonymous], 1989, TETRAHED COMP METHOD
[2]   ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS [J].
CARHART, RE ;
SMITH, DH ;
VENKATARAGHAVAN, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02) :64-73
[3]   COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS [J].
CRAMER, RD ;
PATTERSON, DE ;
BUNCE, JD .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) :5959-5967
[4]  
Crippen G. M., 1981, CHEMOMETRICS RES STU, V1
[5]  
DEMEO G, 1990, FARMACO, V45, P313
[6]   DETERMINATION OF THE SECONDARY STRUCTURE-CONTENT OF PROTEINS IN AQUEOUS-SOLUTIONS FROM THEIR AMIDE-I AND AMIDE-II INFRARED BANDS - COMPARISON BETWEEN CLASSICAL AND PARTIAL LEAST-SQUARES METHODS [J].
DOUSSEAU, F ;
PEZOLET, M .
BIOCHEMISTRY, 1990, 29 (37) :8771-8779
[7]  
DUNN JF, 1981, J CLIN ENDOCR METAB, P63
[8]   A LEISURELY LOOK AT THE BOOTSTRAP, THE JACKKNIFE, AND CROSS-VALIDATION [J].
EFRON, B ;
GONG, G .
AMERICAN STATISTICIAN, 1983, 37 (01) :36-48
[9]   STATISTICAL-DATA ANALYSIS IN THE COMPUTER-AGE [J].
EFRON, B ;
TIBSHIRANI, R .
SCIENCE, 1991, 253 (5018) :390-395
[10]  
Everitt B., 1980, CLUSTER ANAL