Analysis and prediction of functional sub-types from protein sequence alignments

被引:208
作者
Hannenhalli, SS
Russell, RB
机构
[1] SmithKline Beecham Pharmaceut, Res & Dev, Bioinformat Res Grp, Harlow CM19 5AW, Essex, England
[2] SmithKline Beecham Pharmaceut, Res & Dev, Bioinformat Res Grp, King Of Prussia, PA 19406 USA
关键词
protein function; protein structure; prediction; sequence alignment;
D O I
10.1006/jmbi.2000.4036
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional subtypes from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96% compared to 80% obtained for sequence similarity and 74% for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94% compared to 68% for sequence similarity and 79% for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances. (C) 2000 Academic Press.
引用
收藏
页码:61 / 76
页数:16
相关论文
共 58 条
  • [1] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [2] Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families
    Andrade, MA
    Valencia, A
    [J]. BIOINFORMATICS, 1998, 14 (07) : 600 - 607
  • [3] ANDRADE MA, 1999, ISMB, V7, P28
  • [4] Shaping of Drosophila alcohol dehydrogenase through evolution:: Relationship with enzyme functionality
    Atrian, S
    Sánchez-Pulido, L
    Gonzàlez-Duarte, R
    Valencia, A
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1998, 47 (02) : 211 - 221
  • [5] Model of the Ran-RCC1 interaction using biochemical and docking experiments
    Azuma, Y
    Renault, L
    García-Ranea, JA
    Valencia, A
    Nishimoto, T
    Wittinghofer, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1999, 289 (04) : 1119 - 1130
  • [6] The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 49 - 54
  • [7] ALSCRIPT - A TOOL TO FORMAT MULTIPLE SEQUENCE ALIGNMENTS
    BARTON, GJ
    [J]. PROTEIN ENGINEERING, 1993, 6 (01): : 37 - 40
  • [8] Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins
    Bateman, A
    Birney, E
    Durbin, R
    Eddy, SR
    Finn, RD
    Sonnhammer, ELL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 260 - 262
  • [9] Effector recognition by the small GTP-binding proteins Ras and Ral
    Bauer, B
    Mirey, G
    Vetter, IR
    García-Ranea, JA
    Valencia, A
    Wittinghofer, A
    Camonis, JH
    Cool, RH
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 1999, 274 (25) : 17763 - 17770
  • [10] PairWise and SearchWise: Finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames
    Birney, E
    Thompson, JD
    Gibson, TJ
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (14) : 2730 - 2739