SnapDRAGON: a method to delineate protein structural domains from sequence data

被引:68
作者
George, RA [1 ]
Heringa, J [1 ]
机构
[1] Natl Inst Med Res, Div Math Biol, London NW7 1AA, England
基金
英国医学研究理事会;
关键词
protein; domain; boundaries; prediction; folding;
D O I
10.1006/jmbi.2001.5387
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4%. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9% for proteins comprising continuous domains only, and 35.4% for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8%. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation. (C) 2002 Elsevier Science Ltd.
引用
收藏
页码:839 / 851
页数:13
相关论文
共 69 条
  • [1] Multiple domain protein diagnostic patterns
    Adams, RM
    Das, S
    Smith, TF
    [J]. PROTEIN SCIENCE, 1996, 5 (07) : 1240 - 1249
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] GLOBAL FOLD DETERMINATION FROM A SMALL NUMBER OF DISTANCE RESTRAINTS
    ASZODI, A
    GRADWELL, MJ
    TAYLOR, WR
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (02) : 308 - 326
  • [4] Hierarchic inertial projection: A fast distance matrix embedding algorithm
    Aszodi, A
    Taylor, WR
    [J]. COMPUTERS & CHEMISTRY, 1997, 21 (01): : 13 - 23
  • [5] Aszódi A, 1997, PROTEINS, P38
  • [6] SECONDARY STRUCTURE FORMATION IN MODEL POLYPEPTIDE-CHAINS
    ASZODI, A
    TAYLOR, WR
    [J]. PROTEIN ENGINEERING, 1994, 7 (05): : 633 - 644
  • [7] FOLDING POLYPEPTIDE ALPHA-CARBON BACKBONES BY DISTANCE GEOMETRY METHODS
    ASZODI, A
    TAYLOR, WR
    [J]. BIOPOLYMERS, 1994, 34 (04) : 489 - 505
  • [8] PROTEIN MODULES
    BARON, M
    NORMAN, DG
    CAMPBELL, ID
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1991, 16 (01) : 13 - 17
  • [9] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [10] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242