PREDICT-2ND: a tool for generalized protein local structure prediction

被引:31
作者
Katzman, Sol [1 ]
Barrett, Christian [2 ]
Thiltgen, Grant [1 ]
Karchin, Rachel [3 ]
Karplus, Kevin [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Inst Computat Med, Baltimore, MD 21218 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btn438
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Predictions of protein local structure, derived from sequence alignment information alone, provide visualization tools for biologists to evaluate the importance of amino acid residue positions of interest in the absence of X-ray crystal/NMR structures or homology models. They are also useful as inputs to sequence analysis and modeling tools, such as hidden Markov models (HMMS), which can be used to search for homology in databases of known protein structure. In addition, local structure predictions can be used as a component of cost functions in genetic algorithms that predict protein tertiary structure. We have developed a program (PREDICT-2ND) that trains multilayer neural networks and have applied it to numerous local structure alphabets, tuning network parameters such as the number of layers, the number of units in each layer and the window sizes of each layer. We have had the most success with four-layer networks, with gradually increasing window sizes at each layer. Results: Because the four-layer neural nets occasionally get trapped in poor local optima, our training protocol now uses many different random starts, with short training runs, followed by more training on the best performing networks from the short runs. One recent addition to the program is the option to add a guide sequence to the profile inputs, increasing the number of inputs per position by 20. We find that use of a guide sequence provides a small but consistent improvement in the predictions for several different local-structure alphabets.
引用
收藏
页码:2453 / 2459
页数:7
相关论文
共 38 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) :535-542
[3]  
Bonneau R, 2001, PROTEINS, P119
[4]   Free modeling with Rosetta in CASP6 [J].
Bradley, P ;
Malmström, L ;
Qian, B ;
Schonbrun, J ;
Chivian, D ;
Kim, DE ;
Meiler, K ;
Misura, KMS ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 :128-134
[5]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[6]  
de Brevern AG, 2000, PROTEINS, V41, P271, DOI 10.1002/1097-0134(20001115)41:3<271::AID-PROT10>3.0.CO
[7]  
2-Z
[8]  
Fetrow JS, 1997, PROTEINS, V27, P249, DOI 10.1002/(SICI)1097-0134(199702)27:2<249::AID-PROT11>3.3.CO
[9]  
2-X
[10]   Knowledge-based protein secondary structure assignment [J].
Frishman, D ;
Argos, P .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 23 (04) :566-579