Natural language tagging with genetic algorithms

被引:16
作者
Alba, Enrique [1 ]
Luque, Gabriel [1 ]
Araujo, Lourdes [1 ]
机构
[1] Univ Malaga, Dept Lenguajes & Ciencias Computac, E-29071 Malaga, Spain
关键词
genetic algorithms; CHC algorithm; natural language processing; part-of-speech tagging; parallel algorithms;
D O I
10.1016/j.ipl.2006.07.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work analyzes the relative advantages of different metaheuristic approaches to the well-known natural language processing problem of part-of-speech tagging. This consists of assigning to each word of a text its disambiguated part-of-speech according to the context in which the word is used. We have applied a classic genetic algorithm (GA), a CHC algorithm, and a simulated annealing (SA). Different ways of encoding the solutions to the problem (integer and binary) have been studied, as well as the impact of using parallelism for each of the considered methods. We have performed experiments on different linguistic corpora and compared the results obtained against other popular approaches plus a classic dynamic programming algorithm. Our results claim for the high performances achieved by the parallel algorithms compared to the sequential ones, and state the singular advantages for every technique. Our algorithms and some of its components can be used to represent a new set of state-of-the-art procedures for complex tagging scenarios. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:173 / 182
页数:10
相关论文
共 19 条
[1]  
Alba E, 2005, WILEY SER PARA DIST, P1, DOI 10.1002/0471739383
[2]  
[Anonymous], 1991, Handbook of genetic algorithms
[3]  
[Anonymous], 2000, FDN STAT NATURAL LAN
[4]  
Araujo L, 2004, LECT NOTES COMPUT SC, V3102, P889
[5]  
ARAUJO L, 2002, LECT NOTES COMPUTER, V2276, P230
[6]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[7]  
Brants T, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, P224
[8]  
Charniak Eugene, 1993, STAT LANGUAGE LEARNI
[9]  
DeRose S. J., 1988, Computational Linguistics, V14, P31
[10]  
Eshelman L. J., 1991, FDN GENETIC ALGORITH, V1, P265, DOI DOI 10.1016/B978-0-08-050684-5.50020-3