Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches

被引:373
作者
Aravind, L
Koonin, EV [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[2] Texas A&M Univ, Dept Biol, College Stn, TX 77843 USA
关键词
iterative database search; PSI-BLAST; structure prediction; DNA ligase; sialoglycoprotease;
D O I
10.1006/jmbi.1999.2653
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using a number of diverse protein families as test cases, we investigate the ability of the recently developed iterative sequence database search method, PSI-BLAST, to identify subtle relationships between proteins that originally have been deemed detectable only at the level of structure-structure comparison. We show that PSI-BLAST can detect many, though not all, of such relationships, but the success critically depends on the optimal choice of the query sequence used to initiate the search. Generally, there is a correlation between the diversity of the sequences detected in the first pass of database screening and the ability of a given query to detect subtle relationships in subsequent iterations. Accordingly, a thorough analysis of protein superfamilies at the sequence level is necessary in order to maximize the chances of gleaning non-trivial structural and functional inferences, as opposed to a single search, initiated, for example, with the sequence of a protein whose structure is available. This strategy is illustrated by several findings, each of which involves an unexpected structural prediction: (i) a number of previously undetected proteins with the HSP70-actin fold are identified, including a highly conserved and nearly ubiquitous family of metal-dependent proteases (typified by bacterial O-sialoglycoprotease) that represent an adaptation of this fold to a new type of enzymatic activity; (ii) we show that, contrary to the previous conclusions, ATP-dependent and NAD-dependent DNA ligases are confidently predicted to possess the same fold; (iii) the C-terminal domain of S-phosphoglycerate dehydrogenase, which binds serine and is involved in allosteric regulation of the enzyme activity, is shown to typify a new superfamily of ligand-binding, regulatory domains found primarily in enzymes and regulators of amino acid and purine metabolism; (iv) the immunoglobulin-like DNA-binding domain previously identified in the structures of transcription factors NF kappa B and NFAT is shown to be a member of a distinct superfamily of intracellular and extracellular domains with the immunoglobulin fold; and (v) the Rag-2 subunit of the V-D-J recombinase is shown to contain a kelch-type beta-propeller domain which rules out its evolutionary relationship with bacterial transposases.
引用
收藏
页码:1023 / 1040
页数:18
相关论文
共 95 条
  • [1] Do aligned sequences share the same fold?
    Abagyan, RA
    Batalov, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) : 355 - 368
  • [2] A NEUTRAL GLYCOPROTEASE OF PASTEURELLA-HAEMOLYTICA A1 SPECIFICALLY CLEAVES O-SIALOGLYCOPROTEINS
    ABDULLAH, KM
    UDOH, EA
    SHEWEN, PE
    MELLORS, A
    [J]. INFECTION AND IMMUNITY, 1992, 60 (01) : 56 - 62
  • [3] Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system
    Agrawal, A
    Eastman, QM
    Schatz, DG
    [J]. NATURE, 1998, 394 (6695) : 744 - 751
  • [4] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [5] Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases
    Altschul, SF
    Koonin, EV
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) : 444 - 447
  • [6] [Anonymous], INTELL SYST MOL BIOL
  • [7] Phosphoesterase domains associated with DNA polymerases of diverse origins
    Aravind, L
    Koonin, EV
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (16) : 3746 - 3752
  • [8] Toprim - a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins
    Aravind, L
    Leipe, DD
    Koonin, EV
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (18) : 4205 - 4213
  • [9] ARAVIND L, 1999, IN PRESS NUCL ACIDS, V27
  • [10] A genome-based approach for the identification of essential bacterial genes
    Arigoni, F
    Talabot, F
    Peitsch, M
    Edgerton, MD
    Meldrum, E
    Allet, E
    Fish, R
    Jamotte, T
    Curchod, ML
    Loferer, H
    [J]. NATURE BIOTECHNOLOGY, 1998, 16 (09) : 851 - 856