CD-HIT Suite: a web server for clustering and comparing biological sequences

被引:2012
作者
Huang, Ying [1 ]
Niu, Beifang [1 ]
Gao, Ying [1 ]
Fu, Limin [1 ]
Li, Weizhong [1 ]
机构
[1] Univ Calif San Diego, Calif Inst Telecommun & Informat Technol, La Jolla, CA 92093 USA
基金
美国国家卫生研究院;
关键词
PROTEIN;
D O I
10.1093/bioinformatics/btq003
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels.
引用
收藏
页码:680 / 682
页数:3
相关论文
共 9 条
[1]   SMART 6: recent updates and new developments [J].
Letunic, Ivica ;
Doerks, Tobias ;
Bork, Peer .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D229-D232
[2]   Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences [J].
Li, Weizhong ;
Godzik, Adam .
BIOINFORMATICS, 2006, 22 (13) :1658-1659
[3]   Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets [J].
Li, Weizhong ;
Wooley, John C. ;
Godzik, Adam .
PLOS ONE, 2008, 3 (10)
[4]   Tolerating some redundancy significantly speeds up clustering of large protein databases [J].
Li, WZ ;
Jaroszewski, L ;
Godzik, A .
BIOINFORMATICS, 2002, 18 (01) :77-82
[5]   Clustering of highly homologous sequences to reduce the size of large protein databases [J].
Li, WZ ;
Jaroszewski, L ;
Godzik, A .
BIOINFORMATICS, 2001, 17 (03) :282-283
[6]   UniRef: comprehensive and non-redundant UniProt reference clusters [J].
Suzek, Baris E. ;
Huang, Hongzhan ;
McGarvey, Peter ;
Mazumder, Raja ;
Wu, Cathy H. .
BIOINFORMATICS, 2007, 23 (10) :1282-1288
[7]   A core gut microbiome in obese and lean twins [J].
Turnbaugh, Peter J. ;
Hamady, Micah ;
Yatsunenko, Tanya ;
Cantarel, Brandi L. ;
Duncan, Alexis ;
Ley, Ruth E. ;
Sogin, Mitchell L. ;
Jones, William J. ;
Roe, Bruce A. ;
Affourtit, Jason P. ;
Egholm, Michael ;
Henrissat, Bernard ;
Heath, Andrew C. ;
Knight, Rob ;
Gordon, Jeffrey I. .
NATURE, 2009, 457 (7228) :480-U7
[8]   The Sorcerer II Global Ocean Sampling expedition:: Expanding the universe of protein families [J].
Yooseph, Shibu ;
Sutton, Granger ;
Rusch, Douglas B. ;
Halpern, Aaron L. ;
Williamson, Shannon J. ;
Remington, Karin ;
Eisen, Jonathan A. ;
Heidelberg, Karla B. ;
Manning, Gerard ;
Li, Weizhong ;
Jaroszewski, Lukasz ;
Cieplak, Piotr ;
Miller, Christopher S. ;
Li, Huiying ;
Mashiyama, Susan T. ;
Joachimiak, Marcin P. ;
van Belle, Christopher ;
Chandonia, John-Marc ;
Soergel, David A. ;
Zhai, Yufeng ;
Natarajan, Kannan ;
Lee, Shaun ;
Raphael, Benjamin J. ;
Bafna, Vineet ;
Friedman, Robert ;
Brenner, Steven E. ;
Godzik, Adam ;
Eisenberg, David ;
Dixon, Jack E. ;
Taylor, Susan S. ;
Strausberg, Robert L. ;
Frazier, Marvin ;
Venter, J. Craig .
PLOS BIOLOGY, 2007, 5 (03) :432-466
[9]   Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering [J].
Yooseph, Shibu ;
Li, Weizhong ;
Sutton, Granger .
BMC BIOINFORMATICS, 2008, 9 (1)