Robust classification for imprecise environments

被引:723
作者
Provost, F
Fawcett, T
机构
[1] NYU, New York, NY 10012 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
关键词
classification; learning; uncertainty; evaluation; comparison; multiple models; cost-sensitive learning; skewed distributions;
D O I
10.1023/A:1007601015854
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-world environments it usually is difficult to specify target operating conditions precisely, for example, target misclassification costs. This uncertainty makes building robust classification systems problematic. We show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. In some cases, the performance of the hybrid actually can surpass that of the best known classifier. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. The hybrid also is efficient to build, to store, and to update. The hybrid is based on a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull (ROCCH) method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. Finally, we point to empirical evidence that a robust hybrid classifier indeed is needed for many real-world problems.
引用
收藏
页码:203 / 231
页数:29
相关论文
共 40 条
  • [1] Ali KM, 1996, MACH LEARN, V24, P173, DOI 10.1007/BF00058611
  • [2] [Anonymous], 1997, EVALUATION METHODS M
  • [3] [Anonymous], 1996, P 2 INT C KNOWLEDGE
  • [4] [Anonymous], P INT C MACH LEARN I
  • [5] [Anonymous], 1999, P KDD, DOI [10.1145/312129.312195, DOI 10.1016/J.EC0LENG.2010.11.031]
  • [6] The Quickhull algorithm for convex hulls
    Barber, CB
    Dobkin, DP
    Huhdanpaa, H
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1996, 22 (04): : 469 - 483
  • [7] BECK JR, 1986, ARCH PATHOL LAB MED, V110, P13
  • [8] Berry MichaelJ., 1997, DATA MINING TECHNIQU
  • [9] Blackwell D, 1954, Theory of Games and Statistical Decisions
  • [10] SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation
    Blewitt, Marnie E.
    Gendrel, Anne-Valerie
    Pang, Zhenyi
    Sparrow, Duncan B.
    Whitelaw, Nadia
    Craig, Jeffrey M.
    Apedaile, Anwyn
    Hilton, Douglas J.
    Dunwoodie, Sally L.
    Brockdorff, Neil
    Kay, Graham F.
    Whitelaw, Emma
    [J]. NATURE GENETICS, 2008, 40 (05) : 663 - 669