Strategies for high-throughput comparative modeling: Applications to leverage analysis in structural genomics and protein family organization

被引:15
作者
Mirkovic, Nebojsa [1 ]
Li, Zhaohui [1 ]
Parnassa, Andrew [1 ]
Murray, Diana [1 ]
机构
[1] Cornell Univ, Weill Med Coll, Dept Microbiol & Immunol, New York, NY 10021 USA
关键词
homology modeling; target selection; bioinformatics;
D O I
10.1002/prot.21191
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is similar to 39,000, so that the average leverage is similar to 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.
引用
收藏
页码:766 / 777
页数:12
相关论文
共 89 条
[1]   Automated structure-based prediction of functional sites in proteins: Applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking [J].
Aloy, P ;
Querol, E ;
Aviles, FX ;
Sternberg, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 311 (02) :395-408
[2]   ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information [J].
Armon, A ;
Graur, D ;
Ben-Tal, N .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 307 (01) :447-463
[3]   A GRAPH-THEORETIC APPROACH TO THE IDENTIFICATION OF 3-DIMENSIONAL PATTERNS OF AMINO-ACID SIDE-CHAINS IN PROTEIN STRUCTURES [J].
ARTYMIUK, PJ ;
POIRRETTE, AR ;
GRINDLEY, HM ;
RICE, DW ;
WILLETT, P .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (02) :327-344
[4]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[5]   GenBank: update [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D23-D26
[6]  
Berman Helen M, 2004, Am J Pharmacogenomics, V4, P247, DOI 10.2165/00129785-200404040-00004
[7]   SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics [J].
Bertone, P ;
Kluger, Y ;
Lan, N ;
Zheng, DY ;
Christendat, D ;
Yee, A ;
Edwards, AM ;
Arrowsmith, CH ;
Montelione, GT ;
Gerstein, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (13) :2884-2898
[8]   Structural genomics: an overview [J].
Blundell, TL ;
Mizuguchi, K .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 2000, 73 (05) :289-295
[9]   Structural genomics of enzymes involved in sterol/isoprenoid biosynthesis [J].
Bonanno, JB ;
Edo, C ;
Eswar, N ;
Pieper, U ;
Romanowski, MJ ;
Ilyin, V ;
Gerchman, SE ;
Kycia, H ;
Studier, FW ;
Sali, A ;
Burley, SK .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (23) :12896-12901
[10]   Population statistics of protein structures: Lessons from structural classifications [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (03) :369-376