Clustering of highly homologous sequences to reduce the size of large protein databases

被引:802
作者
Li, WZ
Jaroszewski, L
Godzik, A [1 ]
机构
[1] San Diego Supercomp Ctr, La Jolla, CA 92093 USA
[2] Burnham Inst, La Jolla, CA 92037 USA
关键词
D O I
10.1093/bioinformatics/17.3.282
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560 000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.
引用
收藏
页码:282 / 283
页数:2
相关论文
共 2 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Removing near-neighbour redundancy from large protein sequence collections [J].
Holm, L ;
Sander, C .
BIOINFORMATICS, 1998, 14 (05) :423-429