EXTENDING THE TREND VECTOR - THE TREND MATRIX AND SAMPLE-BASED PARTIAL LEAST-SQUARES

被引:58
作者
SHERIDAN, RP
NACHBAR, RB
BUSH, BL
机构
[1] Molecular Systems Department, Merck Research Laboratories, Rahway, 07065, NJ
关键词
ATOM PAIRS; PLS; SAMPLS; TOPOLOGICAL DESCRIPTORS; QSAR;
D O I
10.1007/BF00126749
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Trends vector analysis [Cathart, R.E. et al., J. Chem. Inf. Comput. Sci., 25 (1985) 64], in combination with topological descriptors such as atom pairs, has proved useful in drug discovery for ranking large collections of chemical compounds in order of predicted biological activity. The compounds with the highest predicted activities, upon being tested, often show a several-fold increase in the fraction of active compounds relative to a randomly selected set. A trend vector is simply the one-dimensional array of correlations between the biological activity of interest and a set of properties or 'descriptors' of compounds in a training set. This paper examines two methods for generalizing the trend vector to improve the predicted rank order. The trend matrix method finds the correlations between the residuals and the simultaneous occurrence of descriptors, which are stored in a two-dimensional analog of the trend vector. The SAMPLS method derives a linear model by partial least squares (PLS), using the 'sample-based' formulation of PLS [Bush, B.L. and Nachbar, R.B., J. Comput.-Aided Mel. Design, 7 (1993) 587] for efficiency in treating the large number of descriptors. PLS accumulates a predictive model as a sum of linear components. Expressed as a vector of prediction coefficients on properties, the first PLS component is proportional to the trend vector. Subsequent components adjust the model toward full least squares. For both methods the residuals decrease, while the risk of overfitting the training set increases. We therefore also describe statistical checks to prevent overfitting. These methods are applied to two data sets, a small homologous series of disubstituted piperidines, tested on the dopamine receptor, and a large set of diverse chemical structures, some of which are active at the muscarinic receptor. Each data set is split into a training set and a test set, and the activities in the test set are predicted from a fit on the training set. Both the trend matrix and the SAMPLS approach improve the predictions over the simple trend vector. The SAMPLS approach is superior to the trend matrix in that it requires much less storage and CPU time. It also provides a useful set of axes for visualizing properties of the compounds. We describe a randomization method to determine the optimum number of PLS components that is very much faster for large training sets than leave-one-out cross-validation.
引用
收藏
页码:323 / 340
页数:18
相关论文
共 16 条
  • [1] SAMPLE-DISTANCE PARTIAL LEAST-SQUARES - PLS OPTIMIZED FOR MANY VARIABLES, WITH APPLICATION TO COMFA
    BUSH, BL
    NACHBAR, RB
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1993, 7 (05) : 587 - 619
  • [2] ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS
    CARHART, RE
    SMITH, DH
    VENKATARAGHAVAN, R
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02): : 64 - 73
  • [3] THE PROBABILITY OF CHANCE CORRELATION USING PARTIAL LEAST-SQUARES (PLS)
    CLARK, M
    CRAMER, RD
    [J]. QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1993, 12 (02): : 137 - 145
  • [4] COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS
    CRAMER, RD
    PATTERSON, DE
    BUNCE, JD
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) : 5959 - 5967
  • [5] PARTIAL LEAST-SQUARES REGRESSION - A TUTORIAL
    GELADI, P
    KOWALSKI, BR
    [J]. ANALYTICA CHIMICA ACTA, 1986, 185 : 1 - 17
  • [6] NOVEL PIPERIDINE SIGMA RECEPTOR LIGANDS AS POTENTIAL ANTIPSYCHOTIC-DRUGS
    GILLIGAN, PJ
    CAIN, GA
    CHRISTOS, TE
    COOK, L
    DRUMMOND, S
    JOHNSON, AL
    KERGAYE, AA
    MCELROY, JF
    ROHRBACH, KW
    SCHMIDT, WK
    TAM, SW
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 1992, 35 (23) : 4344 - 4361
  • [7] QUADRATIC PLS REGRESSION
    HOSKULDSSON, A
    [J]. JOURNAL OF CHEMOMETRICS, 1992, 6 (06) : 307 - 334
  • [9] THE KERNEL ALGORITHM FOR PLS
    LINDGREN, F
    GELADI, P
    WOLD, S
    [J]. JOURNAL OF CHEMOMETRICS, 1993, 7 (01) : 45 - 59
  • [10] A METHOD FOR AUTOMATIC-GENERATION OF NOVEL CHEMICAL STRUCTURES AND ITS POTENTIAL APPLICATIONS TO DRUG DISCOVERY
    NILAKANTAN, R
    BAUMAN, N
    VENKATARAGHAVAN, R
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1991, 31 (04): : 527 - 530