Clustering of highly homologous sequences to reduce the size of large protein databases

被引：802

作者：

Li, WZ

Jaroszewski, L

Godzik, A ^{[1
]}

机构：

[1] San Diego Supercomp Ctr, La Jolla, CA 92093 USA

[2] Burnham Inst, La Jolla, CA 92037 USA

来源：

BIOINFORMATICS | 2001年 / 17卷 / 03期

关键词：

D O I：

10.1093/bioinformatics/17.3.282

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560 000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.

引用

页码：282 / 283

页数：2