Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework

被引:620
作者
Katoh, Kazutaka [2 ]
Toh, Hiroyuki [1 ]
机构
[1] Kyushu Univ, Med Inst Bioregulat, Fukuoka 8128582, Japan
[2] Kyushu Univ, Digital Med Initiat, Fukuoka 8128582, Japan
关键词
D O I
10.1186/1471-2105-9-212
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. Results: We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage. Conclusion: The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.
引用
收藏
页数:13
相关论文
共 53 条
[1]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[2]   Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization [J].
Bauer, Markus ;
Klau, Gunnar W. ;
Reinert, Knut .
BMC BIOINFORMATICS, 2007, 8 (1)
[3]   STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time [J].
Dalli, Deniz ;
Wilm, Andreas ;
Mainz, Indra ;
Steger, Gerhard .
BIOINFORMATICS, 2006, 22 (13) :1593-1599
[4]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[5]   CONTRAfold: RNA secondary structure prediction without physics-based models [J].
Do, Chuong B. ;
Woods, Daniel A. ;
Batzoglou, Serafim .
BIOINFORMATICS, 2006, 22 (14) :E90-E98
[6]   A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure [J].
Eddy, SR .
BMC BIOINFORMATICS, 2002, 3 (1)
[7]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[8]   A benchmark of multiple sequence alignment programs upon structural RNAs [J].
Gardner, PP ;
Wilm, A ;
Washietl, S .
NUCLEIC ACIDS RESEARCH, 2005, 33 (08) :2433-2439
[9]   A comprehensive comparison of comparative RNA structure prediction approaches [J].
Gardner, PP ;
Giegerich, R .
BMC BIOINFORMATICS, 2004, 5 (1)
[10]  
GOTOH O, 1995, COMPUT APPL BIOSCI, V11, P543