基于自助平均的朴素贝叶斯文本分类器

被引:5
作者
白莉媛 [1 ]
黄晖 [2 ]
刘素华 [1 ]
阎秋玲 [1 ]
机构
[1] 河南工业大学信息科学与工程学院
[2] 河南工业大学理学院
关键词
分布聚类; 文本分类; 朴素贝叶斯分类器; 自助平均;
D O I
暂无
中图分类号
TP391.1 [文字信息处理]; TP18 [人工智能理论];
学科分类号
081203 ; 0835 ; 081104 ; 0812 ; 1405 ;
摘要
针对单词簇上训练朴素贝叶斯文本分类器概率估计偏差较大所导致的分类精度较低问题,在概率分布聚类算法得到的单词簇的基础上,根据单词与簇间互信息建立有序单词子序列,采用有放回随机抽样对序列构造规模相当的样本集,并将估计出的参数的平均值作为训练得到的参数对未知文本进行分类。公共文本实验数据集上的实验结果表明,该文提出的训练方法相对于传统的朴素贝叶斯分类器训练方法能够获得更高的分类精度且过程相对简单。
引用
收藏
页码:190 / 192
页数:3
相关论文
共 9 条
[1]  
Bow:A Toolkit for Statistical Language Modeling,Text Retrieval,Classification and Clustering. McCallum A K. http://www.cs.cmu.edu/~mccallum/bow . 1996
[2]  
Distributional Clustering of Words for Text Classification. Baker L D,,McCallum A. Proceedings of the 21th Annual International ACM SIGIR . 1998
[3]  
A divisive information theoretic feature clustering algorithm for text classification. Inderjit S Dhillon,,Subramanyam Mallela,Rahul Kumar. The Journal of Machine Learning Research . 2003
[4]  
Distributional Clustering of English Words. Pereira F,Tishby N,Lee L. Proc of the 31th Annual Meeting of the ACL . 1993
[5]  
Newsweeder:Learning to Filter News. Lang K. Proceedings of the 12th International Conference on Machine Learning . 1995
[6]  
The Power of Word Clusters for Text Classification. Slonim N,,Tishby N. Proc of the 23th European Colloquium on Information Retrieval Research . 2001
[7]  
Machine Learning. Mitchell T M. . 1997
[8]  
On Feature Distributional Clustering for Text Categorization. Bekkerman R,El-Yaniv R,Winter Y,et al. Proc of ACM SIGIR’01 . 2001
[9]  
Speeding Up k-means Clustering by Bootstrap Averaging. Davidson I,,Satyanarayana A. Proceedings of the 3rd IEEE International Conference on Data Mining . 2003