Accurate prediction of protein disordered regions by mining protein structure data

被引:158
作者
Cheng, JL [1 ]
Sweredoski, MJ [1 ]
Baldi, P [1 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Inst Genom & Bioinformat, Irvine, CA 92697 USA
基金
美国国家卫生研究院;
关键词
protein structure prediction; disordered regions; recursive neural networks;
D O I
10.1007/s10618-005-0001-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Intrinsically disordered regions in proteins are relatively frequent and important for our understanding of molecular recognition and assembly, and protein structure and function. From an algorithmic standpoint, flagging large disordered regions is also important for ab initio protein structure prediction methods. Here we first extract a curated, non-redundant, data set of protein disordered regions from the Protein Data Bank and compute relevant statistics on the length and location of these regions. We then develop an ab initio predictor of disordered regions called DISpro which uses evolutionary information in the form of profiles, predicted secondary structure and relative solvent accessibility, and ensembles of 1D-recursive neural networks. DISpro is trained and cross validated using the curated data set. The experimental results show that DISpro achieves an accuracy of 92.8% with a false positive rate of 5%. DISpro is a member of the SCRATCH suite of protein data mining tools available through http://www.igb.uci.edu/servers/psss.html.
引用
收藏
页码:213 / 222
页数:10
相关论文
共 17 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2004, J MACH LEARN RES, DOI DOI 10.1162/153244304773936054
[3]   Input-output HMM's for sequence processing [J].
Bengio, Y ;
Frasconi, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (05) :1231-1249
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Intrinsic disorder and protein function [J].
Dunker, AK ;
Brown, CJ ;
Lawson, JD ;
Iakoucheva, LM ;
Obradovic, Z .
BIOCHEMISTRY, 2002, 41 (21) :6573-6582
[6]  
Frasconi P, 2002, NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, P25, DOI 10.1109/NNSP.2002.1030014
[7]   Protein secondary structure prediction based on position-specific scoring matrices [J].
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (02) :195-202
[8]   DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES [J].
KABSCH, W ;
SANDER, C .
BIOPOLYMERS, 1983, 22 (12) :2577-2637
[9]  
LI X, 1999, GENOME INFORM, V42, P389
[10]   Protein disorder prediction: Implications for structural proteomics [J].
Linding, R ;
Jensen, LJ ;
Diella, F ;
Bork, P ;
Gibson, TJ ;
Russell, RB .
STRUCTURE, 2003, 11 (11) :1453-1459