New heuristic method for data discretization based on rough set theory

被引:8
作者
ZHAO, Jun [1 ]
ZHOU, Ying-hua [1 ]
机构
[1] Institute of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing
来源
Journal of China Universities of Posts and Telecommunications | 2009年 / 16卷 / 06期
基金
中国国家自然科学基金;
关键词
cut; cut significance; data discretization; rough set theory; selection probability;
D O I
10.1016/S1005-8885(08)60296-4
中图分类号
学科分类号
摘要
Data discretization contributes much to the induction of classification rules or trees by machine learning methods. The rough set theory is a valid tool for discretizing continuous information systems. Herein, a new method is proposed to improve those typical rough set based heuristic algorithms for data discretization, by utilizing decision information to reduce the scales of candidate cuts, and by more reasonably measuring cut significance with a new conception of cut selection probability. Simulations demonstrate that compared with other typical discretization algorithms based on the rough set theory, the proposed method is more capable and valid to discretize continuous information systems. It can effectively improve the predictive accuracies of information systems while still conceptually keeping their consistency. © 2009 The Journal of China Universities of Posts and Telecommunications.
引用
收藏
页码:113 / 120
页数:7
相关论文
共 19 条
  • [1] Clark P., Niblett T., The CN2 algorithm, Machine Learning, 3, 4, pp. 261-283, (1989)
  • [2] Kaufman K.A., Michalski R.S., Learning in an inconsistent world: rule selection in AQ18. Reports of the machine learning and inference laboratory, MLI99-1, (1999)
  • [3] Joachims T., A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, Proceedings of the 14th International Conference on Machine Learning (ICML-97), Jul 8-12, 1997, Nashville, TN, USA, pp. 143-151, (1997)
  • [4] Quinlan J.R., C4.5: programs for machine learning, (1993)
  • [5] Dougherty J., Kohavi R., Sahami M., Supervised and unsupervised discretization of continuous features, Proceedings of the 12th International Conference on Machine Learning (ICML-95), Jul 9-12, 1995, Tahoe City, CA, USA, pp. 194-202, (1995)
  • [6] Liu H., Hussain F., Tan C.L., Et al., Discretization: an enabling technique, Data Mining and Knowledge Discovery, 6, 4, pp. 393-423, (2002)
  • [7] Pawlak Z., Grzymala B.J., Slowinski R., Et al., Rough sets, Communications of the ACM, 38, 11, pp. 89-95, (1995)
  • [8] Blajdo P., Grzymala B.J., Zdzislaw S.H., Et al., A comparison of six approaches to discretization-A rough set perspective, Proceedings of the 3rd International Conference on Rough Sets and Knowledge Technology (RSKT'08), May 17-19, 2008, Chengdu, China. LNCS 5009, pp. 31-38, (2008)
  • [9] Nguyen S.H., Skowron A., Quantization of real value attributes: rough set and boolean reasoning approach, Bulletin of International Rough Set Society, 1, 1, pp. 5-16, (1996)
  • [10] Dai J.H., Li Y.X., Study on discretization based on rough set theory, Proceedings of the 1st International Conference on Machine Learning and Cybernetics, Nov 4-5, 2002, Beijing, China, pp. 1371-1373, (2002)