Recent developments in the MAFFT multiple sequence alignment program

被引:3048
作者
Katoh, Kazutaka [1 ]
Toh, Hiroyuki [1 ]
机构
[1] Kyushu Univ, Med Inst Bioregulat, Digital Med Initiat, Fukuoka 8128582, Japan
关键词
large-scale sequence alignment; fast tree building; ncRNA structural alignment; amino acid sequence alignment;
D O I
10.1093/bib/bbn013
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations. The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems, in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment. We review these and other techniques that MAFFT uses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/.
引用
收藏
页码:286 / 298
页数:13
相关论文
共 114 条
[1]  
Adachi J, 1996, COMPUTER SCI MONOGRA
[2]   A statistical score for assessing the quality of multiple sequence alignments [J].
Ahola, Virpi ;
Aittokallio, Tero ;
Vihinen, Mauno ;
Uusipaikka, Esa .
BMC BIOINFORMATICS, 2006, 7 (1)
[3]   Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics [J].
Althaus, E ;
Caprara, A ;
Lenhof, HP ;
Reinert, K .
BIOINFORMATICS, 2002, 18 :S4-S16
[4]  
Altschul SF, 1998, PROTEINS, V32, P88, DOI 10.1002/(SICI)1097-0134(19980701)32:1<88::AID-PROT10>3.3.CO
[5]  
2-X
[6]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[7]   Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-coffee [J].
Armougom, Fabrice ;
Moretti, Sebastien ;
Poirot, Olivier ;
Audic, Stephane ;
Dumas, Pierre ;
Schaeli, Basile ;
Keduas, Vladimir ;
Notredame, Cedric .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W604-W608
[8]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[9]   Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization [J].
Bauer, Markus ;
Klau, Gunnar W. ;
Reinert, Knut .
BMC BIOINFORMATICS, 2007, 8 (1)
[10]  
BERGER MP, 1991, COMPUT APPL BIOSCI, V7, P479