Implications of the Dirichlet assumption for discretization of continuous variables in naive Bayesian classifiers

被引:32
作者
Hsu, CN [1 ]
Huang, HJ
Wong, TT
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
[2] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 300, Taiwan
[3] Natl Cheng Kung Univ, Inst Informat Management, Tainan 701, Taiwan
关键词
naive Bayesian classifiers; Dirichlet distributions; perfect aggregation; continuous variables; discretization; lazy discretization; interval data;
D O I
10.1023/A:1026367023636
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is "perfect aggregation," which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addition, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and bin-log l, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.
引用
收藏
页码:235 / 263
页数:29
相关论文
共 19 条
[1]
Almond R.G., 1995, Graphical belief modeling
[2]
[Anonymous], 1993, P 13 INT JOINT C ART
[3]
AZAIEZ MN, 1993, THESIS U WISCONSIN M
[4]
Blake C.L., 1998, UCI repository of machine learning databases
[5]
CESTNIK B, 1991, LECT NOTES ARTIF INT, V482, P138, DOI 10.1007/BFb0017010
[6]
On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[7]
DOUGHERTY J, 1995, MACH LEARN P 12 INT
[8]
Hart P.E., 1973, Pattern recognition and scene analysis
[9]
Heckerman D, 1998, NATO ADV SCI I D-BEH, V89, P301
[10]
VERY SIMPLE CLASSIFICATION RULES PERFORM WELL ON MOST COMMONLY USED DATASETS [J].
HOLTE, RC .
MACHINE LEARNING, 1993, 11 (01) :63-91