Multiple sequence alignments of partially coding nucleic acid sequences

被引:21
作者
Stocsits, RR
Hofacker, IL
Fried, C
Stadler, PF
机构
[1] Univ Vienna, Inst Theoret Chem, A-1090 Vienna, Austria
[2] Univ Leipzig, Interdisciplinary Ctr Bioinformat, D-04107 Leipzig, Germany
[3] Univ Leipzig, Dept Comp Sci, Bioinformat Grp, D-04107 Leipzig, Germany
[4] Santa Fe Inst, Santa Fe, NM 87501 USA
关键词
D O I
10.1186/1471-2105-6-160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes. Results: The standard scoring scheme for nucleic acid alignments can be extended to incorporate simultaneously information on translation products in one or more reading frames. Here we present a multiple alignment tool, codaln, that implements a combined nucleic acid plus amino acid scoring model for pairwise and progressive multiple alignments that allows arbitrary weighting for almost all scoring parameters. Resource requirements of codaln are comparable with those of standard tools such as ClustalW. Conclusion: We demonstrate the applicability of codaln to various biologically relevant types of sequences ( bacteriophage Levivirus and Vertebrate Hox clusters) and show that the combination of nucleic acid and amino acid sequence information leads to improved alignments. These, in turn, increase the performance of analysis tools that depend strictly on good input alignments such as methods for detecting conserved RNA secondary structure elements.
引用
收藏
页数:8
相关论文
共 41 条
[1]   INVITRO RECOMBINATION AND TERMINAL ELONGATION OF RNA BY Q-BETA REPLICASE [J].
BIEBRICHER, CK ;
LUCE, R .
EMBO JOURNAL, 1992, 11 (13) :5129-5135
[2]   SEQUENCE-ANALYSIS OF RNA SPECIES SYNTHESIZED BY Q-BETA REPLICASE WITHOUT TEMPLATE [J].
BIEBRICHER, CK ;
LUCE, R .
BIOCHEMISTRY, 1993, 32 (18) :4848-4854
[3]   Ancient origin of the Hox gene cluster [J].
Ferrier, DEK ;
Holland, PWH .
NATURE REVIEWS GENETICS, 2001, 2 (01) :33-38
[4]   ARCHETYPAL ORGANIZATION OF THE AMPHIOXUS HOX GENE-CLUSTER [J].
GARCIAFERNANDEZ, J ;
HOLLAND, PWH .
NATURE, 1994, 370 (6490) :563-566
[5]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[6]   CONSERVED ELEMENTS IN THE 3' UNTRANSLATED REGION OF FLAVIVIRUS RNAS AND POTENTIAL CYCLIZATION SEQUENCES [J].
HAHN, CS ;
HAHN, YS ;
RICE, CM ;
LEE, E ;
DALGARNO, L ;
STRAUSS, EG ;
STRAUSS, JH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (01) :33-41
[7]   AN ALGORITHM COMBINING DNA AND PROTEIN ALIGNMENT [J].
HEIN, J .
JOURNAL OF THEORETICAL BIOLOGY, 1994, 167 (02) :169-174
[8]  
Hein J, 1996, METHOD ENZYMOL, V266, P402
[9]   Conserved RNA secondary structures in viral genomes: a survey [J].
Hofacker, IL ;
Stadler, PF ;
Stocsits, RR .
BIOINFORMATICS, 2004, 20 (10) :1495-1499
[10]   Automatic detection of conserved base pairing patterns in RNA virus genomes [J].
Hofacker, IL ;
Stadler, PF .
COMPUTERS & CHEMISTRY, 1999, 23 (3-4) :401-414