Domain size distributions can predict domain boundaries

被引:142
作者
Wheelan, SJ
Marchler-Bauer, A
Bryant, SH [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[2] Johns Hopkins Univ, Sch Med, Dept Mol Biol & Genet, Baltimore, MD 21205 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/16.7.613
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries. Results: To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis.
引用
收藏
页码:613 / 618
页数:6
相关论文
共 20 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   UNDERLYING ORDER IN PROTEIN-SEQUENCE ORGANIZATION [J].
BERMAN, AL ;
KOLKER, E ;
TRIFONOV, EN .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (09) :4044-4047
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   STATISTICS OF SEQUENCE-STRUCTURE THREADING [J].
BRYANT, SH ;
ALTSCHUL, SF .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1995, 5 (02) :236-244
[5]   The Hsp70 and Hsp60 chaperone machines [J].
Bukau, B ;
Horwich, AL .
CELL, 1998, 92 (03) :351-366
[6]   Surprising similarities in structure comparison [J].
Gibrat, JF ;
Madej, T ;
Bryant, SH .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :377-385
[7]   Molecular chaperones in cellular protein folding [J].
Hartl, FU .
NATURE, 1996, 381 (6583) :571-580
[8]   IDENTIFICATION AND ANALYSIS OF DOMAINS IN PROTEINS [J].
ISLAM, SA ;
LUO, JC ;
STERNBERG, MJE .
PROTEIN ENGINEERING, 1995, 8 (06) :513-525
[9]  
Jones S, 1998, PROTEIN SCI, V7, P233
[10]   THREADING A DATABASE OF PROTEIN CORES [J].
MADEJ, T ;
GIBRAT, JF ;
BRYANT, SH .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1995, 23 (03) :356-369