Determining functional specificity from protein sequences

被引:18
作者
Donald, JE [1 ]
Shakhnovich, EI [1 ]
机构
[1] Harvard Univ, Dept Chem & Biol Chem, Cambridge, MA 02138 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/bti396
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Given a large family of homologous protein sequences, many methods can divide the family into smaller groups that correspond to the different functions carried out by proteins within the family. One important problem, however, has been the absence of a general method for selecting an appropriate level of granularity, or size of the groups. Results: We propose a consistent way of choosing the granularity that is independent of the sequence similarity and sequence clustering method used. We study three large, well-investigated protein families: basic leucine zippers, nuclear receptors and proteins with three consecutive C2H2 zinc fingers. Our method is tested against known functional information, the experimentally determined binding specificities, using a simple scoring method. The significance of the groups is also measured by randomizing the data. Finally, we compare our algorithm against a popular method of grouping proteins, the TRIBE-MCL method. In the end, we determine that dividing the families at the proposed level of granularity creates very significant and useful groups of proteins that correspond to the different DNA-binding motifs. We expect that such groupings will be useful in studying not only DNA binding but also other protein interactions.
引用
收藏
页码:2629 / 2635
页数:7
相关论文
共 29 条
[21]   Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors [J].
Mirny, LA ;
Gelfand, MS .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 321 (01) :7-20
[22]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448
[23]   A genomic perspective on protein families [J].
Tatusov, RL ;
Koonin, EV ;
Lipman, DJ .
SCIENCE, 1997, 278 (5338) :631-637
[24]   CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE [J].
THOMPSON, JD ;
HIGGINS, DG ;
GIBSON, TJ .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4673-4680
[25]   A comprehensive comparison of multiple sequence alignment programs [J].
Thompson, JD ;
Plewniak, F ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 1999, 27 (13) :2682-2690
[26]   How well is enzyme function conserved as a function of pairwise sequence identity? [J].
Tian, WD ;
Skolnick, J .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 333 (04) :863-882
[27]   Secator: A program for inferring protein subfamilies from phylogenetic trees [J].
Wicker, N ;
Perrin, GR ;
Thierry, JC ;
Poch, O .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (08) :1435-1441
[28]  
Yona G, 1999, PROTEINS, V37, P360, DOI 10.1002/(SICI)1097-0134(19991115)37:3<360::AID-PROT5>3.3.CO
[29]  
2-Q