Influence of data discretization on efficiency of Bayesian classifier for authorship attribution

被引:16
作者
Baron, Grzegorz [1 ]
机构
[1] Silesian Tech Univ, PL-44100 Gliwice, Poland
来源
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014 | 2014年 / 35卷
关键词
Bayesian classifier; Naive Bayes; stylometry; authorship attribution; text analysis; classification; discretization; binarization; DECISION TREE; NAIVE;
D O I
10.1016/j.procs.2014.08.201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Authorship attribution is one of the research areas in data mining domain and various methods can be employed for performing that task. The paper presents results of research on influence of data discretization on efficiency of Naive Bayes classifier. The analysis has been carried on datasets founded on texts of two male and two female authors using the WEKA data mining software framework. The binary classification was performed separately for both datasets for wide range of parameters of discretization process in order to investigate dependency between ways of discretization and quality of classification using Naive Bayes method. The numerical results of tests have been compared and discussed and some observations and conclusions formulated. (C) 2014 The Authors. Published by Elsevier B. V.
引用
收藏
页码:1112 / 1121
页数:10
相关论文
共 29 条
[1]  
[Anonymous], 1993, Proceedings of the 13th International Joint Conference on Artificial Intelligence
[2]   Robust approach for estimating probabilities in Naive-Bayes Classifier for gene expression data [J].
Chandra, B. ;
Gupta, Manish .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) :1293-1298
[3]  
Dougherty, 1995, P 12 INT C MACH LEAR, V1995, P194, DOI DOI 10.1016/B978-1-55860-377-6.50032-3
[4]   Hybrid decision tree and naive Bayes classifiers for multi-class classification tasks [J].
Farid, Dewan Md. ;
Zhang, Li ;
Rahman, Chowdhury Mofizur ;
Hossain, M. A. ;
Strachan, Rebecca .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) :1937-1946
[5]  
Hall M., 2009, SIGKDD Explorations, V11, P10, DOI DOI 10.1145/1656274.1656278
[6]   AUTHORSHIP ATTRIBUTION [J].
HOLMES, DI .
COMPUTERS AND THE HUMANITIES, 1994, 28 (02) :87-106
[7]   A Novel Bayes Model: Hidden Naive Bayes [J].
Jiang, Liangxiao ;
Zhang, Harry ;
Cai, Zhihua .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (10) :1361-1371
[8]  
John George H., 1995, ESTIMATING CONTINUOU, DOI DOI 10.1109/TGRS.2004.834800
[9]   Some effective techniques for naive Bayes text classification [J].
Kim, Sang-Bum ;
Han, Kyoung-Soo ;
Rim, Hae-Chang ;
Myaeng, Sung Hyon .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (11) :1457-1466
[10]   Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers [J].
Klement, William ;
Wilk, Szymon ;
Michalowski, Wojtek ;
Farion, Ken J. ;
Osmond, Martin H. ;
Verter, Vedat .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2012, 54 (03) :163-170