MUSCLE: a multiple sequence alignment method with reduced time and space complexity

被引:4938
作者
Edgar, RC [1 ]
机构
[1] Univ Calif Berkeley, Dept Plant & Microbial Biol, Berkeley, CA 94720 USA
关键词
D O I
10.1186/1471-2105-5-113
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. Results: We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy ( MUSCLE with default settings), highest speed ( MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer. Conclusions: MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at http:// www.drive5.com/muscle.
引用
收藏
页码:1 / 19
页数:19
相关论文
共 49 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]  
Dayhoff MO, 1978, A model of evolutionary change in proteins, P345
[6]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[7]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[8]   Local homology recognition and distance measures in linear time using compressed amino acid alphabets [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (01) :380-385
[9]  
EDGAR RC, 2004, IN PRESS BIOINFORMAT
[10]  
Felsenstein J., 2003, INFERRING PHYLOGENIE