统计流形扩散核的文本分类方法

被引:2
作者
李侃 [1 ]
周世斌 [2 ]
刘玉树 [1 ]
机构
[1] 北京理工大学计算机学院
[2] 中国矿业大学计算机科学与技术学院
关键词
统计流形; 扩散核; Dirichlet分布; 文本分类;
D O I
10.16451/j.cnki.issn1003-6059.2012.02.017
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
提出Dirichlet混合多项式(DCM)流形,并利用DCM流形可与正半球流形建立同胚和等距关系的性质,通过拉回映射将正半球流形的测地距离映射为DCM流形的测地距离,从而在DCM流形上建立距离度量,构建统计流形上的Dirichlet混合多项式扩散核和Dirichlet混合多项式倒排文档频率(DCMIDF)扩散核.利用WebKB Top4和20 Newsgroups语料库上进行实验,DCM流形能比欧氏空间更能准确地描述文本.与多项式核支持向量机算法、,负测地距离核支持向量机算法相比,实验结果显示文中基于DCM扩散核和DCMIDF扩散核的支持向量机算法可取得良好的文本分类效果.
引用
收藏
页码:339 / 345
页数:7
相关论文
共 13 条
[1]  
Problems of Learning on Manifolds. Belkin M. . 2003
[2]  
Modeling Word Burstiness Using the Dirichlet Distribution. Madsen R E,Kauchak D,Elkan C. Proc of the22nd International Conference on Machine Learning . 2005
[3]  
Non-Isometric Manifold Learning:Analysis and an Algorithm. Dollar P,Rabaud V,Belongie S. Proc of the24th International Conference on Machine Learning . 2007
[4]  
Text Classification with Kernels on the Multinomial Manifold. Zhang D,Chen Xi,LEE W S. Proc of the28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval . 2005
[5]  
Diffusion Kernels on Statistical Manifolds. Lafferty J,Lebanon G. Journal of Machine Learning Research . 2004
[6]  
Diffusion Kernels on Graphs and Other Discrete Input Spaces. Kondor R I,Lafferty J D. Proc of the19th International Conference on Machine Learning . 2002
[7]  
Statistical Decision Rules and Optimal Inference. Cencov N N. . 1982
[8]  
Probability Product Kernels. Jebara T,Kondor R,Honward A. Journal of Machine Learning Research . 2004
[9]  
Exploiting Generative Models in Discriminative Classifiers. Jaakkola T S,Haussler D. Advances in Neural Information Processing Systems . 1999
[10]  
An Extended∨Cencov Characterization of the Information Metric. Campbell L L. Proc of the American Mathematical Society . 1986