A lot of randomness is hiding in accuracy

被引：81

作者：

Ben-David, Arle ^{[1
]}

机构：

[1] Holon Inst Technol, Dept Informat Management, IL-58102 Holon, Israel

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2007年 / 20卷 / 07期

关键词：

accuracy; kappa; chance driven" hits; machine learning; AUC;

D O I：

10.1016/j.engappai.2007.01.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 [计算机科学与技术];

摘要：

The proportion of successful hits, usually referred to as "accuracy", is by far the most dominant meter for measuring classifiers' accuracy. This is despite of the fact that accuracy does not compensate for hits that can be attributed to mere chance. Is it a meaningful flaw in the context of machine learning? Are we using the wrong meter for decades? The results of this study do suggest that the answers to these questions are positive. Cohen's kappa, a meter that does compensate for random hits, was compared with accuracy, using a benchmark of fifteen datasets and five well-known classifiers. It turned out that the average probability of a hit being the result of mere chance exceeded one third (!). It was also found that the proportion of random hits varied with different classifiers that were applied even to a single dataset. Consequently, the rankings of classifiers' accuracy, with and without compensation for random hits, differed from each other in eight out of the fifteen datasets. Therefore, accuracy may well fail in its main task, namely to properly measure the accuracy-wise merits of the classifiers themselves. (C) 2007 Elsevier Ltd. All rights reserved.

引用

页码：875 / 885

页数：11

共 19 条

[1]

Alpaydin Ethem, 2004, Introduction to machine learning

[2]

Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[3]

PRO-OPIOMELANOCORTIN MESSENGER-RNA SIZE HETEROGENEITY IN ACTH-DEPENDENT CUSHINGS-SYNDROME [J].

CLARK, AJL ;

LAVENDER, PM ;

BESSER, GM ;

REES, LH .

JOURNAL OF MOLECULAR ENDOCRINOLOGY, 1989, 2 (01) :3-9

[4]

A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].

COHEN, J .

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46

[5]

Cook RJ., 1998, ENCY BIOSTATISTICS, P2166

[6]

Demsar J, 2006, J MACH LEARN RES, V7, P1

[7]

On the optimality of the simple Bayesian classifier under zero-one loss [J].

Domingos, P ;

Pazzani, M .

MACHINE LEARNING, 1997, 29 (2-3) :103-130

[8]

HIGH AGREEMENT BUT LOW KAPPA .1. THE PROBLEMS OF 2 PARADOXES [J].

FEINSTEIN, AR ;

CICCHETTI, DV .

JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) :543-549

[9]

LAVRAC N, 1999, 9 INT WORKSH IND LOG, P174

[10]

A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms [J].

Lim, TS ;

Loh, WY ;

Shih, YS .

MACHINE LEARNING, 2000, 40 (03) :203-228

← 1 2 →