NONGLOBULAR DOMAINS IN PROTEIN SEQUENCES - AUTOMATED SEGMENTATION USING COMPLEXITY-MEASURES

被引:382
作者
WOOTTON, JC
机构
[1] National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, Bldg. 38A, Room 8N805
来源
COMPUTERS & CHEMISTRY | 1994年 / 18卷 / 03期
关键词
D O I
10.1016/0097-8485(94)85023-2
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Computational methods based on mathematically-defined measures of compositional complexity have been developed to distinguish globular and non-globular regions of protein sequences. Compact globular structures in protein molecules are shown to be determined by amino acid sequences of high informational complexity. Sequences of known crystal structure in the Brookhaven Protein Data Bank differ only slightly from randomly shuffled sequences in the distribution of statistical properties such as local compositional complexity. In contrast, in the much larger body of deduced sequences in the SWISS-PROT database, approximately one quarter of the residues occur in segments of non-randomly low complexity and approximately half of the entries contain at least one such segment. Sequences of proteins with known, physicochemically-defined non-globular regions have been analyzed, including collagens, different classes of coiled-coil proteins, elastins, histones, non-histone proteins, mucins, proteoglycan core proteins and proteins containing long single solvent-exposed alpha-helices. The SEG algorithm provides an effective general method for partitioning the globular and non-globular regions of these sequences fully automatically. This method is also facilitating the discovery of new classes of long, non-globular sequence segments, as illustrated by the example of the human CAN gene product involved in tumor induction.
引用
收藏
页码:269 / 285
页数:17
相关论文
共 45 条
  • [1] ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES
    ALTSCHUL, SF
    BOGUSKI, MS
    GISH, W
    WOOTTON, JC
    [J]. NATURE GENETICS, 1994, 6 (02) : 119 - 129
  • [2] THE LANGUAGE OF PROTEIN FOLDING - MANY FORKED TONGUES
    ARGOS, P
    [J]. COMPUTERS & CHEMISTRY, 1992, 16 (02): : 93 - 102
  • [3] THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK, RECENT DEVELOPMENTS
    BAIROCH, A
    BOECKMANN, B
    [J]. NUCLEIC ACIDS RESEARCH, 1993, 21 (13) : 3093 - 3096
  • [4] BALDWIN CT, 1989, J BIOL CHEM, V264, P15747
  • [5] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) : 535 - 542
  • [6] COMPREHENSIVE SEQUENCE-ANALYSIS OF THE 182 PREDICTED OPEN READING FRAMES OF YEAST CHROMOSOME-III
    BORK, P
    OUZOUNIS, C
    SANDER, C
    SCHARF, M
    SCHNEIDER, R
    SONNHAMMER, E
    [J]. PROTEIN SCIENCE, 1992, 1 (12) : 1677 - 1690
  • [7] BYCHKOVA V E, 1980, Molekulyarnaya Biologiya (Moscow), V14, P278
  • [8] THE CLASSIFICATION AND ORIGINS OF PROTEIN FOLDING PATTERNS
    CHOTHIA, C
    FINKELSTEIN, AV
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 1990, 59 : 1007 - 1039
  • [9] ALPHA-HELICAL COILED COILS - MORE FACTS AND BETTER PREDICTIONS
    COHEN, C
    PARRY, DAD
    [J]. SCIENCE, 1994, 263 (5146) : 488 - 489
  • [10] COHEN C, 1990, RPOTEINS, V7, P1