UniqueProt: creating representative protein sequence sets

被引：98

作者：

Mika, S

Rost, B

机构：

[1] Columbia Univ, Dept Biochem & Mol Biophys, CUBIC, New York, NY 10032 USA

[2] Univ Witten Herdecke, Inst Phy Biochem, D-58448 Witten, Germany

[3] Columbia Univ, Ctr Computat Biol & Bioinformat, New York, NY 10032 USA

[4] Columbia Univ, Dept Biochem & Mol Biophys, NE Struct Genom Consortium, New York, NY 10032 USA

来源：

NUCLEIC ACIDS RESEARCH | 2003年 / 31卷 / 13期

关键词：

D O I：

10.1093/nar/gkg620

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site.

引用

页码：3789 / 3791

页数：3

共 21 条

[1]

ALEXANDROV NN, 1998, HICCS 98 PAC S BIOC, P463

[2]

Altschul SF, 1996, METHOD ENZYMOL, V266, P460

[3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].