Determining functional specificity from protein sequences

被引:18
作者
Donald, JE [1 ]
Shakhnovich, EI [1 ]
机构
[1] Harvard Univ, Dept Chem & Biol Chem, Cambridge, MA 02138 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bti396
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Given a large family of homologous protein sequences, many methods can divide the family into smaller groups that correspond to the different functions carried out by proteins within the family. One important problem, however, has been the absence of a general method for selecting an appropriate level of granularity, or size of the groups. Results: We propose a consistent way of choosing the granularity that is independent of the sequence similarity and sequence clustering method used. We study three large, well-investigated protein families: basic leucine zippers, nuclear receptors and proteins with three consecutive C2H2 zinc fingers. Our method is tested against known functional information, the experimentally determined binding specificities, using a simple scoring method. The significance of the groups is also measured by randomizing the data. Finally, we compare our algorithm against a popular method of grouping proteins, the TRIBE-MCL method. In the end, we determine that dividing the families at the proposed level of granularity creates very significant and useful groups of proteins that correspond to the different DNA-binding motifs. We expect that such groupings will be useful in studying not only DNA binding but also other protein interactions.
引用
收藏
页码:2629 / 2635
页数:7
相关论文
共 29 条
[11]   USES FOR EVOLUTIONARY TREES [J].
FITCH, WM .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES B-BIOLOGICAL SCIENCES, 1995, 349 (1327) :93-102
[12]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[13]  
Gansner ER, 2000, SOFTWARE PRACT EXPER, V30, P1203, DOI 10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.0.CO
[14]  
2-N
[15]   Recent improvements to the PROSITE database [J].
Hulo, N ;
Sigrist, CJA ;
Le Saux, V ;
Langendijk-Genevaux, PS ;
Bordoli, L ;
Gattiker, A ;
De Castro, E ;
Bucher, P ;
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D134-D137
[16]   Three classes of C2H2 zinc finger proteins [J].
Iuchi, S .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2001, 58 (04) :625-635
[17]   A set-theoretic approach to database searching and clustering [J].
Krause, A ;
Vingron, M .
BIOINFORMATICS, 1998, 14 (05) :430-438
[18]   CluSTr: a database of clusters of SWISS-PROT plus TrEMBL proteins [J].
Kriventseva, EV ;
Fleischmann, W ;
Zdobnov, EM ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :33-36
[19]   OrthoMCL: Identification of ortholog groups for eukaryotic genomes [J].
Li, L ;
Stoeckert, CJ ;
Roos, DS .
GENOME RESEARCH, 2003, 13 (09) :2178-2189
[20]   Domains, motifs and clusters in the protein universe [J].
Liu, JF ;
Rost, B .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2003, 7 (01) :5-11