A general edit distance between RNA structures

被引:139
作者
Jiang, T
Lin, GH
Ma, B
Zhang, KZ
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
[2] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA
[3] Univ Western Ontario, Dept Comp Sci, London, ON N6A 5B7, Canada
关键词
sequence annotation; string edit; dynamic programming; approximation algorithm; time efficiency;
D O I
10.1089/10665270252935511
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Arc-annotated sequences are useful in representing the structural information of RNA sequences. In general, RNA secondary and tertiary structures can be represented as a set of nested arcs and a set of crossing arcs, respectively. Since RNA functions are largely determined by molecular confirmation and therefore secondary and tertiary structures, the comparison between RNA secondary and tertiary structures has received much attention recently. In this paper, we propose the notion of edit distance to measure the similarity between two RNA secondary and tertiary structures, by incorporating various edit operations performed on both bases and arcs (i.e., base-pairs). Several algorithms are presented to compute the edit distance between two RNA sequences with various arc structures and under various score schemes, either exactly or approximately, with provably good performance. Preliminary experimental tests confirm that our definition of edit distance and the computation model are among the most reasonable ones ever studied in the literature.
引用
收藏
页码:371 / 388
页数:18
相关论文
共 26 条
[1]  
Alimonti P, 1997, LECT NOTES COMPUT SC, V1203, P288
[2]  
Bafna V, 1995, LECT NOTES COMPUT SC, V937, P1
[3]   The Ribonuclease P Database [J].
Brown, JW .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :314-314
[4]  
CORPET F, 1994, COMPUT APPL BIOSCI, V10, P389
[5]  
Evans P., 1999, THESIS U VICTORIA
[6]   Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming [J].
Goemans, MX ;
Williamson, DP .
JOURNAL OF THE ACM, 1995, 42 (06) :1115-1145
[7]  
GOLDMAN D, 1999, IEEE P 40 ANN C FDN, P512
[8]  
Gusfield D, 1997, ALGORITHMS STRINGS T
[9]  
Jiang T, 2000, LECT NOTES COMPUT SC, V1848, P154
[10]  
Klein P., 1998, LECT NOTES COMPUTER, V1461, P91