A novel fold recognition method using composite predicted secondary structures

被引:14
作者
An, YL
Friesner, RA [1 ]
机构
[1] Columbia Univ, Dept Chem, New York, NY 10027 USA
[2] Columbia Univ, Ctr Biomol Simulat, New York, NY 10027 USA
关键词
fold recognition; structural homologue; composite secondary structure; secondary structure segment; CASP; alignment;
D O I
10.1002/prot.10145
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases. (C) 2002 Wiley-Liss, Inc.
引用
收藏
页码:352 / 366
页数:15
相关论文
共 36 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches [J].
Aravind, L ;
Koonin, EV .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 287 (05) :1023-1040
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   JPred: a consensus secondary structure prediction server [J].
Cuff, JA ;
Clamp, ME ;
Siddiqui, AS ;
Finlay, M ;
Barton, GJ .
BIOINFORMATICS, 1998, 14 (10) :892-893
[5]  
Di Francesco V, 1997, PROTEINS, P123
[6]  
Domingues FS, 1999, PROTEINS, P112
[7]   Prediction of protein tertiary structure to low resolution: Performance for a large and structurally diverse test set [J].
Eyrich, VA ;
Standley, DM ;
Friesner, RA .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 288 (04) :725-742
[8]  
Eyrich VA, 2002, ADV CHEM PHYS, V120, P223
[9]  
FUKUNISHI Y, 1997, UNPUB PROTEIN SEQUEN
[10]   PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS [J].
GRIBSKOV, M ;
MCLACHLAN, AD ;
EISENBERG, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) :4355-4358