The problem of overfitting

被引:1746
作者
Hawkins, DM [1 ]
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2004年 / 44卷 / 01期
关键词
D O I
10.1021/ci0342472
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Overfitting problem in model fitting for quantitative measurements is discussed. Two types of overfitting can be distinguished, which include using a model that is more flexible than it needs to be and using a model that includes irrelevant components or predictors. Adding predictors that perform no useful function means that in future use of the regression to make predictions it will be needed to measure and record the predictors so that their values can be substituted in the model. Adding irrelevant predictors can also make predictions worse because the coefficients fitted to them add random variation to the subsequent predictions.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 10 条
  • [1] Cook D.R., 1999, APPL REGRESSION INCL
  • [2] Efron B., 1982, SOC IND APPL MATH CB, V38, DOI [10.1137/1.9781611970319, DOI 10.1137/1.9781611970319]
  • [3] A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS
    FRANK, IE
    FRIEDMAN, JH
    [J]. TECHNOMETRICS, 1993, 35 (02) : 109 - 135
  • [4] FRANKLIN NL, 1956, T I CHEM ENG-LOND, P34
  • [5] Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
  • [6] Gnanadesikan R., 1997, METHODS STAT DATA AN, V2nd, DOI 10.2307/2965459
  • [7] QSAR with few compounds and many features
    Hawkins, DM
    Basak, SC
    Shi, XF
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (03): : 663 - 670
  • [8] INFLATION OF R2 IN BEST SUBSET REGRESSION
    RENCHER, AC
    PUN, FC
    [J]. TECHNOMETRICS, 1980, 22 (01) : 49 - 53
  • [9] STRUCTURE-ACTIVITY-RELATIONSHIPS OF ANTIFILARIAL ANTIMYCIN ANALOGS - A MULTIVARIATE PATTERN-RECOGNITION STUDY
    SELWOOD, DL
    LIVINGSTONE, DJ
    COMLEY, JCW
    ODOWD, AB
    HUDSON, AT
    JACKSON, P
    JANDU, KS
    ROSE, VS
    STABLES, JN
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 1990, 33 (01) : 136 - 142
  • [10] A fuzzy ARTMAP based on quantitative structure-property relationships (QSPRs) for predicting aqueous solubility of organic compounds
    Yaffe, D
    Cohen, Y
    Espinosa, G
    Arenas, A
    Giralt, F
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (05): : 1177 - 1207