Multiple comparisons in induction algorithms

被引：118

作者：

Jensen, DD ^{[1
]}

Cohen, PR ^{[1
]}

机构：

[1] Univ Massachusetts, Dept Comp Sci, Expt Knowledge Syst Lab, Amherst, MA 01003 USA

来源：

MACHINE LEARNING | 2000年 / 38卷 / 03期

关键词：

inductive learning; overfitting; oversearching; attribute selection; hypothesis testing; parameter estimation;

D O I：

10.1023/A:1007631014630

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.

引用

页码：309 / 338

页数：30

共 48 条

[1]

[Anonymous], 1982, TOPICS APPL MULTIVAR

[2]

[Anonymous], P 10 NAT C ART INT S

[3]

[Anonymous], 1988, MACHINE INTELLIGENCE

[4]

[Anonymous], 1995, Randomization tests

[5] LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION [J].

BLUMER, A ;

EHRENFEUCHT, A ;

HAUSSLER, D ;

WARMUTH, MK .

JOURNAL OF THE ACM, 1989, 36 (04) :929-965

[6]

Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946

[7]

BRODLEY C, 1993, TRAINING ISSUES INCR, P99

[8]

Cohen P. R., 1995, Empirical Methods for Artificial Intelligence

[9]

CRC press, 1987, MULTIVARIATE ANAL VA

[10] Overfitting and undercomputing in machine learning [J].

Dietterich, T .

ACM COMPUTING SURVEYS, 1995, 27 (03) :326-327

← 1 2 3 4 5 →