Information fusion for text classification - an experimental comparison

被引:17
作者
Dasigi, V
Mann, RC
Protopopescu, VA
机构
[1] So Polytechn State Univ, Dept Comp Sci, Marietta, GA 30060 USA
[2] Oak Ridge Natl Lab, Div Life Sci, Oak Ridge, TN 37831 USA
[3] Oak Ridge Natl Lab, Div Comp Sci & Math, Oak Ridge, TN 37831 USA
关键词
text classification; features; latent semantic indexing; reference library; neural networks; information fusion;
D O I
10.1016/S0031-3203(00)00171-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article reports on our experiments and results on the effectiveness of different feature sets and information fusion from some combinations of them in classifying free text documents into a given number of categories. We use different feature sets and integrate neural network learning into the method. The feature sets are based on the "latent semantics" of a reference library - a collection of documents adequately representing the desired concepts. We found that a larger reference library is not necessarily better. Information fusion almost always gives better results than the individual constituent feature sets. with certain combinations doing better than the others. (C) 2001 Published by Elsevier Science Ltd on behalf of Pattern Recognition Society.
引用
收藏
页码:2413 / 2425
页数:13
相关论文
共 17 条
[1]  
BELKIN N, 1994, NAT I STANDARDS TECH, P35
[2]  
COHEN WW, 1996, P 19 ANN INT ACM SIG, P307
[3]  
DASIGI V, 1997, ORNLTM13354
[4]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[5]  
2-9
[6]  
DUMAIS S, 1996, AAAI SPRING S MACH L, P26
[7]  
FORSYTHE GE, 1977, COMPUTER METHODS MAT, pCH9
[8]  
Harman D.K., 1993, P 16 ANN INT ACM SIG, P36
[9]  
HULL DA, 1996, P 19 ANN INT ACM SIG, P279
[10]  
LEWIS DD, 1992, THESIS U MASSACHUSET