The role of occam's razor in knowledge discovery

被引:236
作者
Domingos, P [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
model selection; overfitting; multiple comparisons; comprehensible models; domain knowledge;
D O I
10.1023/A:1009868929893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of "Occam's razor" has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam's razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy-comprehensibility trade-off.
引用
收藏
页码:409 / 425
页数:17
相关论文
共 102 条
  • [1] Abu-Mostafa Y. S., 1990, Journal of Complexity, V6, P192, DOI 10.1016/0885-064X(90)90006-Y
  • [2] BAYESIAN-ANALYSIS OF MINIMUM AIC PROCEDURE
    AKAIKE, H
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1978, 30 (01) : 9 - 14
  • [3] ANDREWS R, 1996, P NIPS 96 WORKSH RUL
  • [4] [Anonymous], KDD NUGGETS
  • [5] [Anonymous], 1989, P 4 EUR WORK SESS LE
  • [6] [Anonymous], ADV KNOWLEDGE DISCOV
  • [7] [Anonymous], 1997, Proceedings of the Fourteenth National Conference on Artificial Intelligence AAAI-97
  • [8] [Anonymous], 1938, OCKHAM STUDIES SELEC
  • [9] [Anonymous], P 12 INT C MACH LEAR
  • [10] [Anonymous], P 1997 INT C KNOWL D