MAFFT version 5: improvement in accuracy of multiple sequence alignment

被引:4026
作者
Katoh, K [1 ]
Kuma, K
Toh, H
Miyata, T
机构
[1] Kyoto Univ, Bioinformat Ctr, Inst Chem Res, Kyoto 6110011, Japan
[2] Waseda Univ, Sch Engn, Dept Elect Engn & Biosci, Tokyo 1698555, Japan
[3] Kyoto Univ, Dept Biophys, Grad Sch Sci, Kyoto 6068502, Japan
关键词
D O I
10.1093/nar/gki198
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of similar to8 sequences with low similarity, the accuracy was improved (2-10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10(-5)-10(-20)) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.
引用
收藏
页码:511 / 518
页数:8
相关论文
共 29 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[3]  
BERGER MP, 1991, COMPUT APPL BIOSCI, V7, P479
[4]   The ASTRAL Compendium in 2004 [J].
Chandonia, JM ;
Hon, G ;
Walker, NS ;
Lo Conte, L ;
Koehl, P ;
Levitt, M ;
Brenner, SE .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D189-D192
[5]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[6]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[7]  
GOTOH O, 1995, COMPUT APPL BIOSCI, V11, P543
[8]   Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments [J].
Gotoh, O .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 264 (04) :823-838
[9]  
GOTOH O, 1993, COMPUT APPL BIOSCI, V9, P361
[10]   Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems [J].
Grasso, C ;
Lee, C .
BIOINFORMATICS, 2004, 20 (10) :1546-1556