Automatic classification of Tamil documents using vector space model and artificial neural network

被引:37
作者
Rajan, K. [1 ]
Ramalingam, V. [1 ]
Ganesan, M. [2 ]
Palanivel, S. [1 ]
Palaniappan, B. [1 ]
机构
[1] Annamalai Univ, Dept Comp Sci & Engn, Annamalainagar, Chidambaram, India
[2] Annamalai Univ, Ctr Adv Studies Linguist, Annamalainagar, Chidambaram, India
关键词
Tamil text classification; Vector space model; Artificial neural network model; Corpus building;
D O I
10.1016/j.eswa.2009.02.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text classification based on vector space model (VSM), artificial neural networks (ANN), K-nearest neighbor (KNN), Naives Bayes (NB) and support vector machine (SVM) have been applied on English language documents, and gained popularity among text mining and information retrieval (IR) researchers. This paper proposes the application of VSM and ANN for the classification of Tamil language documents. Tamil is morphologically rich Dravidian classical language. The development of internet led to an exponential increase in the amount of electronic documents not only in English but also other regional languages. The automatic classification of Tamil documents has not been explored in detail so far. In this paper, corpus is used to construct and test the VSM and ANN models. Methods of document representation, assigning weights that reflect the importance of each term are discussed. In a traditional word-matching based categorization system, the most popular document representation is VSM. This method needs a high dimensional space to represent the documents. The ANN classifier requires smaller number of features. The experimental results show that ANN model achieves 93.33% which is better than the performance of VSM which yields 90.33% on Tamil document classification. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:10914 / 10918
页数:5
相关论文
共 29 条
[1]  
ANNAMALAI E, 1999, MODERN TAMIL DRAVIDI
[2]  
APTE C, 1994, P 17 ANN INT ACM SIG, P21
[3]   What Size Net Gives Valid Generalization? [J].
Baum, Eric B. ;
Haussler, David .
NEURAL COMPUTATION, 1989, 1 (01) :151-160
[4]  
BELEW RK, 1989, SIGIR FORUM, V23, P11, DOI 10.1145/75335.75337
[5]  
CHANUNYA L, 2007, P 1 AS INT C MOD SIM
[6]  
Chiang JH, 2001, 10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, P720, DOI 10.1109/FUZZ.2001.1009056
[7]  
HUA LC, 2007, 7 INT C COMP INF TEC, P47
[8]  
HUA LC, 2006, LNCS, V4234, P302
[9]  
Joachims T., 1998, MACHINE LEARNING ECM, P137, DOI [10.1007/BFb0026683, DOI 10.1007/BFB0026683]
[10]  
LANDAUER TK, 1972, COMP 19 ANN C COGN S, P412