QSAR with few compounds and many features

被引:66
作者
Hawkins, DM
Basak, SC
Shi, XF
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Nat Resources Res Inst, Duluth, MN 55811 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2001年 / 41卷 / 03期
关键词
D O I
10.1021/ci0001177
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Fitting quantitative structure-activity relationships (QSAR) requires different statistical methodologies and, to some degree, philosophies depending on the "shape" of the data matrix. When few features are used and there are many compounds, it is a reasonable expectation that good feature subset selection may be made and that nonlinearities and nonadditivities can be detected and diagnosed. Where there are many features and few compounds, this is unrealistic. Methods such as ridge regression RR, PLS, and principal component regression PCR, which abjure feature selection and rely on linearity may provide good predictions and fair understanding. We report a development of ridge regression for the underdetermined case by using generalized cross-validation to choose the ridge constant and perform F-tests for additional information. Conventional regression diagnostics can be used in followup to identify nonlinearities and other departures from model. We illustrate the approach with QSAR models of four data sets using calculated molecular descriptors.
引用
收藏
页码:663 / 670
页数:8
相关论文
共 31 条
  • [1] PLS regression methods
    Höskuldsson, Agnar
    [J]. Journal of Chemometrics, 1988, 2 (03) : 211 - 228
  • [2] Topological indices: Their nature and mutual relatedness
    Basak, SC
    Balaban, AT
    Grunwald, GD
    Gute, BD
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (04): : 891 - 898
  • [3] Use of topostructural, topochemical, and geometric parameters in the prediction of vapor pressure: A hierarchical QSAR approach
    Basak, SC
    Gute, BD
    Grunwald, GD
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (04): : 651 - 655
  • [4] PREDICTING MUTAGENICITY OF CHEMICALS USING TOPOLOGICAL AND QUANTUM-CHEMICAL PARAMETERS - A SIMILARITY BASED STUDY
    BASAK, SC
    GRUNWALD, GD
    [J]. CHEMOSPHERE, 1995, 31 (01) : 2529 - 2546
  • [5] Use of statistical and neural net approaches in predicting toxicity of chemicals
    Basak, SC
    Grunwald, GD
    Gute, BD
    Balasubramanian, K
    Opitz, D
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (04): : 885 - 890
  • [6] Prediction of complement-inhibitory activity of benzamidines using topological and geometric parameters
    Basak, SC
    Gute, BD
    Ghatak, S
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (02): : 255 - 260
  • [7] BASAK SC, 1995, P 16 INT CANC C, P413
  • [8] Basak SC., 1999, Topological Indices and Related Descriptors in QSAR and QSPR, P563
  • [9] The peculiar shrinkage properties of partial least squares regression
    Butler, NA
    Denham, MC
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2000, 62 : 585 - 593
  • [10] Cook D.R., 1999, APPL REGRESSION INCL