The racing algorithm: Model selection for lazy learners

被引:126
作者
Maron, O [1 ]
Moore, AW [1 ]
机构
[1] CARNEGIE MELLON UNIV,PITTSBURGH,PA 15213
关键词
lazy learning; model selection; cross validation; optimization; attribute selection;
D O I
10.1023/A:1006556606079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called ''racing'' that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models, Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leave-one-out cross validation is efficient. We use racing to select among various lazy learning algorithms and to find relevant features in applications ranging from robot juggling to lesion detection in MRI[ scans.
引用
收藏
页码:193 / 225
页数:33
相关论文
共 37 条
  • [31] SCHAAL S, 1993, P IEEE C ROB AUT
  • [32] Schmitt S. A., 1969, MEASURING UNCERTAINT
  • [33] Skalak D., 1994, MACH LEARN P 11 INT, P293
  • [34] Weiss S. M., 1991, COMPUTER SYSTEMS LEA
  • [35] WELCH BL, 1937, BIOMETRIKA, V29
  • [36] HYBRID SYSTEM FOR PROTEIN SECONDARY STRUCTURE PREDICTION
    ZHANG, X
    MESIROV, JP
    WALTZ, DL
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1992, 225 (04) : 1049 - 1063
  • [37] [No title captured]