Simulated annealing algorithm for the multiple sequence alignment problem:: The approach of polymers in a random medium -: art. no. 031915

被引:5
作者
Hernández-Guía, M
Mulet, R
Rodríguez-Pérez, S
机构
[1] Univ Havana, Fac Phys, Henri Pioncare Grp Complex Syst, Havana 10400, Cuba
[2] Natl Bioinformat Ctr, Havana 10200, Cuba
来源
PHYSICAL REVIEW E | 2005年 / 72卷 / 03期
关键词
D O I
10.1103/PhysRevE.72.031915
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
We propose a probabilistic algorithm to solve the multiple sequence alignment problem. The algorithm is a simulated annealing that exploits the representation of the multiple alignment between D sequences as a directed polymer in D dimensions. Within this representation we can easily track the evolution of the alignment through local moves of low computational cost. In contrast with other probabilistic algorithms proposed to solve this problem, our approach allows the creation and deletion of gaps without extra computational cost. The algorithm was tested by aligning proteins from the kinase family. When D=3 the results are consistent with those obtained using a complete algorithm. For D>3 where the complete algorithm fails, we show that our algorithm still converges to reasonable alignments. We also study the space of solutions obtained and show that depending on the number of sequences aligned the solutions are organized in different ways, suggesting a possible source of errors for progressive algorithms. Finally, we test our algorithm in artificially generated sequences and prove that it may perform better than progressive algorithms. Moreover, in those cases in which a progressive algorithm works better, its solution may be taken as an initial condition of our algorithm and, again, we obtain alignments with lower scores and more relevant from the biological point of view.
引用
收藏
页数:9
相关论文
共 15 条
[1]  
[Anonymous], 2000, INTRO COMPUTATIONAL
[2]   THE MULTIPLE SEQUENCE ALIGNMENT PROBLEM IN BIOLOGY [J].
CARRILLO, H ;
LIPMAN, D .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1988, 48 (05) :1073-1082
[3]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[4]   STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES [J].
GEMAN, S ;
GEMAN, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) :721-741
[5]  
GODZIK A, 1994, COMPUT APPL BIOSCI, V10, P587
[6]  
GUNSFIELD D, 1999, ALGORITHMS STRINGS T
[7]   Similarity detection and localization [J].
Hwa, T ;
Lassig, M .
PHYSICAL REVIEW LETTERS, 1996, 76 (14) :2591-2594
[8]  
ISHIKAWA M, 1993, COMPUT APPL BIOSCI, V9, P267
[9]  
KIM J, 1994, COMPUT APPL BIOSCI, V10, P419
[10]   OPTIMIZATION BY SIMULATED ANNEALING [J].
KIRKPATRICK, S ;
GELATT, CD ;
VECCHI, MP .
SCIENCE, 1983, 220 (4598) :671-680