Progress of structural genomics initiatives: An analysis of solved target structures

被引:99
作者
Todd, AE
Marsden, RL
Thornton, JM
Orengo, CA
机构
[1] UCL, Dept Biochem & Mol Biol, London WC1E 6BT, England
[2] European Bioinformat Inst, Cambridge CB10 1SD, England
基金
美国国家卫生研究院;
关键词
fold; novelty; protein structure; structural genomics; superfamily;
D O I
10.1016/j.jmb.2005.03.037
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>= 30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1235 / 1260
页数:26
相关论文
共 133 条
  • [1] The Southeast Collaboratory for Structural Genomics: A high-throughput gene to structure factory
    Adams, MWW
    Dailey, HA
    Delucas, LJ
    Luo, M
    Prestegard, JH
    Rose, JP
    Wang, BC
    [J]. ACCOUNTS OF CHEMICAL RESEARCH, 2003, 36 (03) : 191 - 198
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] SCOP database in 2004: refinements integrate structure and sequence family data
    Andreeva, A
    Howorth, D
    Brenner, SE
    Hubbard, TJP
    Chothia, C
    Murzin, AG
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D226 - D229
  • [4] Phosphoesterase domains associated with DNA polymerases of diverse origins
    Aravind, L
    Koonin, EV
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (16) : 3746 - 3752
  • [5] Protein structure prediction and structural genomics
    Baker, D
    Sali, A
    [J]. SCIENCE, 2001, 294 (5540) : 93 - 96
  • [6] Crystal structure of an orphan protein (TM0875) from Thermotoga maritima at 2.00-Å resolution reveals a new fold
    Bakolitsa, C
    Schwarzenbacher, R
    McMullan, D
    Brinen, LS
    Canaves, JM
    Dai, XP
    Deacon, AM
    Elsliger, MA
    Eshagi, S
    Floyd, R
    Godzik, A
    Grittini, C
    Grzechnik, SK
    Jaroszewski, L
    Karlak, C
    Klock, HE
    Koesema, E
    Kovarik, JS
    Kreusch, A
    Kuhn, P
    Lesley, SA
    McPhillips, TM
    Miller, MD
    Morse, A
    Moy, K
    Ouyang, J
    Page, R
    Quijano, K
    Robb, A
    Spraggon, G
    Stevens, RC
    van den Bedem, H
    Velasquez, J
    Vincent, J
    von Delft, F
    Wang, XH
    West, B
    Wolf, G
    Hodgson, KO
    Wooley, J
    Wilson, IA
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (03) : 607 - 610
  • [7] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
  • [8] The 2.3-Å crystal structure of the shikimate 5-dehydrogenase orthologue YdiB from Escherichia coli suggests a novel catalytic environment for an NAD-dependent dehydrogenase
    Benach, J
    Lee, I
    Edstrom, W
    Kuzin, AP
    Chiang, YW
    Acton, TB
    Montelione, GT
    Hunt, JF
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2003, 278 (21) : 19176 - 19182
  • [9] GenBank: update
    Benson, DA
    Karsch-Mizrachi, I
    Lipman, DJ
    Ostell, J
    Wheeler, DL
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D23 - D26
  • [10] BOURNE PE, 2004, PACIFIC S BIOCOMPUTI, P404