Search and clustering orders of magnitude faster than BLAST

被引:16361
作者
Edgar, Robert C.
机构
关键词
FAMILIES DATABASE; ALIGNMENT; PROTEIN; TIME;
D O I
10.1093/bioinformatics/btq461
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
引用
收藏
页码:2460 / 2461
页数:2
相关论文
共 8 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
Butte Atul J., 2001, Trends in Biotechnology, V19, P159, DOI 10.1016/S0167-7799(01)01603-1
[3]   Bacterial Community Variation in Human Body Habitats Across Space and Time [J].
Costello, Elizabeth K. ;
Lauber, Christian L. ;
Hamady, Micah ;
Fierer, Noah ;
Gordon, Jeffrey I. ;
Knight, Rob .
SCIENCE, 2009, 326 (5960) :1694-1697
[4]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[5]   Local homology recognition and distance measures in linear time using compressed amino acid alphabets [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (01) :380-385
[6]   The Pfam protein families database [J].
Finn, Robert D. ;
Tate, John ;
Mistry, Jaina ;
Coggill, Penny C. ;
Sammut, Stephen John ;
Hotz, Hans-Rudolf ;
Ceric, Goran ;
Forslund, Kristoffer ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D281-D288
[7]   Rfam: updates to the RNA families database [J].
Gardner, Paul P. ;
Daub, Jennifer ;
Tate, John G. ;
Nawrocki, Eric P. ;
Kolbe, Diana L. ;
Lindgreen, Stinus ;
Wilkinson, Adam C. ;
Finn, Robert D. ;
Griffiths-Jones, Sam ;
Eddy, Sean R. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D136-D140
[8]   Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences [J].
Li, Weizhong ;
Godzik, Adam .
BIOINFORMATICS, 2006, 22 (13) :1658-1659