Sequence-based prediction of protein domains

被引:66
作者
Liu, JF
Rost, B
机构
[1] Columbia Univ, Dept Biochem & Mol Biophys, CUBIC, New York, NY 10032 USA
[2] Columbia Univ, Ctr Computat Biol & Bioinformat C2B2, New York, NY 10032 USA
[3] Columbia Univ, Dept Biochem & Mol Biophys, NE Struct Gen Consortium, New York, NY 10032 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/nar/gkh684
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. Here, we introduced CHOPnet, a de novo method that predicts structural domains in the absence of homology to known domains. Our method was based on neural networks and relied exclusively on information available for all proteins. Evaluating sustained performance through rigorous cross-validation on proteins of known structure, we correctly predicted the number of domains in 69% of all proteins. For 50% of the two-domain proteins the centre of the predicted boundary was closer than 20 residues to the boundary assigned from three-dimensional (3D) structures; this was about eight percentage points better than predictions by 'equal split'. Our results appeared to compare favourably with those from previously published methods. CHOPnet may be useful to restrict the experimental testing of different fragments for structure determination in the context of structural genomics.
引用
收藏
页码:3522 / 3530
页数:9
相关论文
共 79 条
[1]   STRUCTURE AT 2.8-ANGSTROM RESOLUTION OF F1-ATPASE FROM BOVINE HEART-MITOCHONDRIA [J].
ABRAHAMS, JP ;
LESLIE, AGW ;
LUTTER, R ;
WALKER, JE .
NATURE, 1994, 370 (6491) :621-628
[2]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
[Anonymous], 1998, SCIENCE, V282, P2012
[5]   GLOBAL FOLD DETERMINATION FROM A SMALL NUMBER OF DISTANCE RESTRAINTS [J].
ASZODI, A ;
GRADWELL, MJ ;
TAYLOR, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (02) :308-326
[6]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[7]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[8]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[9]   The PRESAGE database for structural genomics [J].
Brenner, SE ;
Barken, D ;
Levitt, M .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :251-253
[10]   Structure of proteins involved in synaptic vesicle fusion in neurons [J].
Brunger, AT .
ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE, 2001, 30 :157-171