Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry

被引:142
作者
Karchin, R [1 ]
Cline, M
Mandel-Gutfreund, Y
Karplus, K
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Jack Baskin Sch Engn, Dept Chem & Biochem, Santa Cruz, CA 95064 USA
[2] Affymetrix Inc, Emeryville, CA USA
关键词
protein structure prediction; two-track HMM; multitrack HMM; information theory; neural network; alignment; secondary structure;
D O I
10.1002/prot.10369
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs that may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two-track profile hidden Markov models (Hmms). We did not rely on a simple helix-strand-coil definition of secondary structure, but experimented with a variety of local structure descriptions, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality. On a test set of 1298 nonhomologous proteins, mms incorporating a 3-letter STRIDE alphabet improved fold recognition accuracy by 15% over amino-acid-only Hmms and 23% over PSI-BLAST, measured by ROC-65 numbers. We compared two-track mms to amino-acid-only mms on a difficult alignment test set of 200 protein pairs (structurally similar with 3-24% sequence identity). mms with a 6-letter STRIDE secondary track improved alignment quality by 62%, relative to DALI structural alignments, while mms with an STR track (an expanded DSSP alphabet that subdivides strands into six states) improved by 40% relative to CE.
引用
收藏
页码:504 / 514
页数:11
相关论文
共 65 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Prediction of local structure in proteins using a library of sequence-structure motifs [J].
Bystroff, C ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) :565-577
[3]   HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins [J].
Bystroff, C ;
Thorsson, V ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (01) :173-190
[4]   Hidden Markov model approach for identifying the modular framework of the protein backbone [J].
Camproux, AC ;
Tuffery, P ;
Chevrolat, JP ;
Boisvieux, JF ;
Hazout, S .
PROTEIN ENGINEERING, 1999, 12 (12) :1063-1073
[5]   Predicting reliable regions in protein sequence alignments [J].
Cline, M ;
Hughey, R ;
Karplus, K .
BIOINFORMATICS, 2002, 18 (02) :306-314
[6]  
CLINE M, 2000, THESIS U CALIFORNIA
[7]  
CLINE M, 1998, UCSCCRL9727 J BASK S
[8]   Information-theoretic dissection of pairwise contact potentials [J].
Cline, MS ;
Karplus, K ;
Lathrop, RH ;
Smith, TF ;
Rogers, RG ;
Haussler, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 49 (01) :7-14
[9]  
de Brevern AG, 2000, PROTEINS, V41, P271, DOI 10.1002/1097-0134(20001115)41:3<271::AID-PROT10>3.0.CO
[10]  
2-Z