Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm

被引:124
作者
Skolnick, J [1 ]
Kihara, D [1 ]
Zhang, Y [1 ]
机构
[1] SUNY Buffalo, Ctr Excellence Bioinformat, Buffalo, NY 14203 USA
关键词
protein structure prediction; fold recognition; structural alignment; weakly homologous/analogous proteins; M; genitalium; E; coli; S; cerevisiae; genomes;
D O I
10.1002/prot.20106
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
This article describes the PROSPECTOR_3 threading algorithm, which combines various scoring functions designed to match structurally related. target/template pairs. Each variant described was found to have a Z-score above which most identified templates have good structural (threading) alignments, Z(struct) (Z(good)). 'Easy' targets with accurate threading alignments are identified as single templates with Z > Z(good) or two templates, each with Z > Z(struct), having a good consensus structure in mutually aligned regions. 'Medium' targets have a pair of templates lacking a consensus structure, or a single template for which Z(struct) < Z < Z(good). PROSPECTOR_3 was applied to a comprehensive Protein Data Bank (PDB) benchmark composed of 1491 single domain proteins, 41-200 residues long and no more than 30% identical to any threading template. Of the proteins, 878 were found to be easy targets, with 761 having a root mean square deviation (RMSD) from native of less than 6.5 Angstrom. The average contact prediction accuracy was 46%, and on average 17.6 residue continuous fragments were predicted with RMSD values of 2.0 Angstrom. There were 606 medium targets identified, 87% (31%) of which had good structural (threading) alignments. On average, 9.1 residue, continuous fragments with RMSD of 2.5 Angstrom were predicted. Combining easy and medium sets, 63% (91%) of the targets had good threading (structural) alignments compared to native; the average target/template sequence identity was 22%. Only nine targets lacked matched templates. Moreover, PROSPECTOR_3 consistently outperforms PSI-BLAST. Similar results were predicted for open reading frames (ORFS)less than or equal to200 residues in the M. genitalium, E. coli and S. cerevisiae genomes. Thus, progress has been made in identification of weakly homologous/analogous proteins, with very high alignment cover. age, both in a comprehensive PDB benchmark as well as in genomes. (C) 2004 Wiley-Liss, Inc.
引用
收藏
页码:502 / 518
页数:17
相关论文
共 78 条
[1]   Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases [J].
Altschul, SF ;
Koonin, EV .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) :444-447
[2]  
ARAKAKI AK, 2003, UNPUB P NATL SCI US
[3]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[4]   Proteomics of Mycoplasma genitalium:: identification and characterization of unannotated and atypical proteins in a small model genome [J].
Balasubramanian, S ;
Schneider, T ;
Gerstein, M ;
Regan, L .
NUCLEIC ACIDS RESEARCH, 2000, 28 (16) :3075-3082
[5]   The Protein Data Bank [J].
Berman, HM ;
Battistuz, T ;
Bhat, TN ;
Bluhm, WF ;
Bourne, PE ;
Burkhardt, K ;
Iype, L ;
Jain, S ;
Fagan, P ;
Marvin, J ;
Padilla, D ;
Ravichandran, V ;
Schneider, B ;
Thanki, N ;
Weissig, H ;
Westbrook, JD ;
Zardecki, C .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 :899-907
[6]  
Betancourt MR, 2001, BIOPOLYMERS, V59, P305, DOI 10.1002/1097-0282(20011015)59:5<305::AID-BIP1027>3.3.CO
[7]  
2-Y
[8]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[9]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[10]   De novo prediction of three-dimensional structures for major protein families [J].
Bonneau, R ;
Strauss, CEM ;
Rohl, CA ;
Chivian, D ;
Bradley, P ;
Malmström, L ;
Robertson, T ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 322 (01) :65-78