A hierarchical neural network document classifier with linguistic feature selection

被引:19
作者
Chen, CM
Lee, HM
Hwang, CW
机构
[1] Natl Hualien Univ Educ, Grad Inst Learning Technol, Hualien 970, Taiwan
[2] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
information retrieval; hierarchical document classifier; back-propagation neural network; feature selection;
D O I
10.1007/s10489-005-4613-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, a neural network document classifier with linguistic feature selection and multi-category output is presented. It consists of a feature selection unit and a hierarchical neural network classification unit. In the feature selection unit, the candidate terms are extracted from some original documents by text processing techniques, and then the conformity and uniformity of each term are analyzed by an entropy function which can measure the significance of terms. Terms with high significance are selected as input features for training neural network document classifiers. in order to reduce the input dimensions, a composition mechanism of fuzzy relation is employed to identify synonyms. By this method, a synonym thesaurus can be constructed to reduce input dimensions. To simplify the learning scheme, the well-known back-propagation learning model is used to build proper hierarchical classification units. In our experiments, a product description database from an electronic commercial company is employed. The experimental results show that this classifier achieves sufficient accuracy to help human classification. it can save much manpower and work time classifying a large database.
引用
收藏
页码:277 / 294
页数:18
相关论文
共 35 条
[21]  
ROTHLAUF, 2002, REPRESENTATIONS GENE
[22]   Hierarchical text categorization using neural networks [J].
Ruiz, ME ;
Srinivasan, P .
INFORMATION RETRIEVAL, 2002, 5 (01) :87-118
[23]   Artificial neural network-based peak load forecasting using conjugate gradient methods [J].
Saini, LM ;
Soni, MK .
IEEE TRANSACTIONS ON POWER SYSTEMS, 2002, 17 (03) :907-912
[24]  
Salton G., 1988, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
[25]  
Sasaki M, 1998, IEEE SYS MAN CYBERN, P2827, DOI 10.1109/ICSMC.1998.725090
[26]   A hierarchical classification strategy for digital documents [J].
Schettini, R ;
Brambilla, C ;
Ciocca, G ;
Valsasna, A ;
De Ponti, M .
PATTERN RECOGNITION, 2002, 35 (08) :1759-1769
[27]   A MATHEMATICAL THEORY OF COMMUNICATION [J].
SHANNON, CE .
BELL SYSTEM TECHNICAL JOURNAL, 1948, 27 (03) :379-423
[28]   Hierarchical text classification and evaluation [J].
Sun, AX ;
Lim, EP .
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, :521-528
[29]  
SUN R, 1995, COMPUTATIONAL ARCHIT
[30]   Stability of steepest descent with momentum for quadratic functions [J].
Torii, M ;
Hagan, MT .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (03) :752-756