DOMpro: Protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks

被引:78
作者
Cheng, Jianlin [1 ]
Sweredoski, Michael J. [1 ]
Baldi, Pierre [1 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Inst Genom & Bioinformat, Irvine, CA 92697 USA
基金
美国国家卫生研究院;
关键词
protein structure prediction; domain; recursive neural networks;
D O I
10.1007/s10618-005-0023-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Protein domains are the structural and functional units of proteins. The ability to parse protein chains into different domains is important for protein classification and for understanding protein structure, function, and evolution. Here we use machine learning algorithms, in the form of recursive neural networks, to develop a protein domain predictor called DOMpro. DOMpro predicts protein domains using a combination of evolutionary information in the form of profiles, predicted secondary structure, and predicted relative solvent accessibility. DOMpro is trained and tested on a curated dataset derived from the CATH database. DOMpro correctly predicts the number of domains for 69% of the combined dataset of single and multi-domain chains. DOMpro achieves a sensitivity of 76% and specificity of 85% with respect to the single-domain proteins and sensitivity of 59% and specificity of 38% with respect to the two-domain proteins. DOMpro also achieved a sensitivity and specificity of 71% and 71% respectively in the Critical Assessment of Fully Automated Structure Prediction 4 (CAFASP-4) (Fisher et al., 1999; Saini and Fischer, 2005) and was ranked among the top ab initio domain predictors. The DOMpro server, software, and dataset are available at http://www.igb.uci.edu/servers/psss.html.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 35 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2004, J MACH LEARN RES, DOI DOI 10.1162/153244304773936054
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   Protein structure prediction servers at university college london [J].
Bryson, K ;
McGuffin, LJ ;
Marsden, RL ;
Ward, JJ ;
Sodhi, JS ;
Jones, DT .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W36-W38
[5]   Anger expression toward parents and depressive symptoms among undergraduates in Taiwan [J].
Cheng, HL ;
Mallinckrodt, B ;
Wu, LC .
COUNSELING PSYCHOLOGIST, 2005, 33 (01) :72-97
[6]  
CHENG J, 2005, IN PRESS DATA MINING
[7]   Automated prediction of CASP-5 structures using the Robetta server [J].
Chivian, D ;
Kim, DE ;
Malmström, L ;
Bradley, P ;
Robertson, T ;
Murphy, P ;
Strauss, CEM ;
Bonneau, R ;
Rohl, CA ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :524-533
[8]  
Fischer D, 1999, PROTEINS, P209
[9]   SnapDRAGON: a method to delineate protein structural domains from sequence data [J].
George, RA ;
Heringa, J .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 316 (03) :839-851
[10]  
GEWEHR JE, 2005, IN PRESS BIOINFORMAT