Khiops: A statistical discretization method of continuous attributes

被引:70
作者
Boulle, M [1 ]
机构
[1] France Telecom R&D, F-22300 Lannion, France
关键词
data mining; machine learning; discretization; data analysis;
D O I
10.1023/B:MACH.0000019804.29836.05
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In supervised machine learning, some algorithms are restricted to discrete data and have to discretize continuous attributes. Many discretization methods, based on statistical criteria, information content, or other specialized criteria, have been studied in the past. In this paper, we propose the discretization method Khiops,(1) based on the chi-square statistic. In contrast with related methods ChiMerge and ChiSplit, this method optimizes the chi-square criterion in a global manner on the whole discretization domain and does not require any stopping criterion. A theoretical study followed by experiments demonstrates the robustness and the good predictive performance of the method.
引用
收藏
页码:53 / 69
页数:17
相关论文
共 16 条
[1]  
BERTELSEN R, 1994, P 7 FLOR ART INT RES, P122
[2]  
BERTIER P., 1981, ANAL DONNEES MULTIDI
[3]  
Blake C.L., 1998, UCI repository of machine learning databases
[4]  
BOULLE M, 2001, NTFTRD7339
[5]  
Breiman L., 1998, CLASSIFICATION REGRE
[6]  
BURDSALL B, 1997, P 2 INT ICSC S FUZZ, P217
[7]  
CATLETT J, 1991, P EUR WORK SESS LEAR, P87
[8]  
Dougherty J., 1995, MACHINE LEARNING P 1, P194, DOI DOI 10.1016/B978-1-55860-377-6.50032-3
[9]  
FAYYAD UM, 1992, MACH LEARN, V8, P87, DOI 10.1023/A:1022638503176
[10]   VERY SIMPLE CLASSIFICATION RULES PERFORM WELL ON MOST COMMONLY USED DATASETS [J].
HOLTE, RC .
MACHINE LEARNING, 1993, 11 (01) :63-91