Recent developments in the MAFFT multiple sequence alignment program

被引:3048
作者
Katoh, Kazutaka [1 ]
Toh, Hiroyuki [1 ]
机构
[1] Kyushu Univ, Med Inst Bioregulat, Digital Med Initiat, Fukuoka 8128582, Japan
关键词
large-scale sequence alignment; fast tree building; ncRNA structural alignment; amino acid sequence alignment;
D O I
10.1093/bib/bbn013
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations. The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems, in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment. We review these and other techniques that MAFFT uses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/.
引用
收藏
页码:286 / 298
页数:13
相关论文
共 114 条
[71]   T-Coffee: A novel method for fast and accurate multiple sequence alignment [J].
Notredame, C ;
Higgins, DG ;
Heringa, J .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) :205-217
[72]   COFFEE: An objective function for multiple sequence alignments [J].
Notredame, C ;
Holm, L ;
Higgins, DG .
BIOINFORMATICS, 1998, 14 (05) :407-422
[73]   Recent evolutions of multiple sequence alignment algorithms [J].
Notredame, Cedric .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (08) :1405-1408
[74]   The accuracy of several multiple sequence alignment programs for proteins [J].
Nuin, Paulo A. S. ;
Wang, Zhouzhi ;
Tillier, Elisabeth R. M. .
BMC BIOINFORMATICS, 2006, 7 (1)
[75]   3DCoffee: Combining protein sequences and structures within multiple sequence alignments [J].
O'Sullivan, O ;
Suhre, K ;
Abergel, C ;
Higgins, DG ;
Notredame, C .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (02) :385-395
[76]   A collection of amino acid replacement matrices derived from clusters of orthologs [J].
Olsen, R ;
Loomis, WF .
JOURNAL OF MOLECULAR EVOLUTION, 2005, 61 (05) :659-665
[77]   COBALT: constraint-based alignment tool for multiple protein sequences [J].
Papadopoulos, Jason S. ;
Agarwala, Richa .
BIOINFORMATICS, 2007, 23 (09) :1073-1079
[78]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448
[79]   PROMALS: towards accurate multiple sequence alignments of distantly related proteins [J].
Pei, Jimin ;
Grishin, Nick V. .
BIOINFORMATICS, 2007, 23 (07) :802-808
[80]   MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information [J].
Pei, Jimin ;
Grishin, Nick V. .
NUCLEIC ACIDS RESEARCH, 2006, 34 (16) :4364-4374