Exploration of Uncharted Regions of the Protein Universe

被引:99
作者
Jaroszewski, Lukasz [1 ]
Li, Zhanwen [2 ]
Krishna, S. Sri [1 ]
Bakolitsa, Constantina [1 ]
Wooley, John [3 ]
Deacon, Ashley M. [4 ]
Wilson, Ian A. [5 ]
Godzik, Adam [1 ,2 ,3 ]
机构
[1] Burnham Inst Med Res, Joint Ctr Struct Genom, La Jolla, CA USA
[2] Burnham Inst Med Res, Joint Ctr Mol Modeling, La Jolla, CA USA
[3] Univ Calif San Diego, Ctr Res Biol Syst, Joint Ctr Struct Genom, La Jolla, CA 92093 USA
[4] SLAC Natl Accelerator Lab, Stanford Synchrotron Radiat Lightsource, Joint Ctr Struct Genom, Menlo Pk, CA USA
[5] Scripps Res Inst, Joint Ctr Struct Genom, La Jolla, CA 92037 USA
基金
美国国家卫生研究院;
关键词
CONSERVED HYPOTHETICAL PROTEINS; FUNCTION PREDICTION; GENE-EXPRESSION; DATABASE; METAPROTEOMICS; NUMBER; FOLDS; EVOLUTION; FAMILIES; SPACE;
D O I
10.1371/journal.pbio.1000205
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies.
引用
收藏
页数:15
相关论文
共 56 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Successful design and conduct of genome-wide association studies [J].
Amos, Christopher I. .
HUMAN MOLECULAR GENETICS, 2007, 16 :R220-R225
[3]   Data growth and its impact on the SCOP database: new developments [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chandonia, John-Marc ;
Brenner, Steven E. ;
Hubbard, Tim J. P. ;
Chothia, Cyrus ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D419-D425
[4]   DNA microarrays [J].
Bier, Frank E. ;
von Nickisch-Rosenegk, Markus ;
Ehrentreich-Foerster, Eva ;
Reiss, Edda ;
Henkel, Joerg ;
Strehlow, Rothin ;
Andresen, Dennie .
BIOSENSING FOR THE 21ST CENTURY, 2008, 109 :433-453
[5]   Discrimination between distant homologs and structural analogs: Lessons from manually constructed, reliable data sets [J].
Cheng, Hua ;
Kim, Bong-Hyun ;
Grishin, Nick V. .
JOURNAL OF MOLECULAR BIOLOGY, 2008, 377 (04) :1265-1278
[6]   PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST [J].
CHOTHIA, C .
NATURE, 1992, 357 (6379) :543-544
[7]  
Dayhoff M O., 1978, Atlas of Protein Seq Struct, ppp 345
[8]   SIMILAR AMINO-ACID-SEQUENCES - CHANCE OR COMMON ANCESTRY [J].
DOOLITTLE, RF .
SCIENCE, 1981, 214 (4517) :149-159
[9]   Hidden Markov models [J].
Eddy, SR .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :361-365
[10]   Illumina universal bead arrays [J].
Fan, Jian-Bing ;
Gunderson, Kevin L. ;
Bibikova, Marina ;
Yeakley, Joanne M. ;
Chen, Jing ;
Wickham Garcia, Eliza ;
Lebruska, Lori L. ;
Laurent, Marc ;
Shen, Richard ;
Barker, David .
DNA MICROARRAYS PART A: ARRAY PLATFORMS AND WET-BENCH PROTOCOLS, 2006, 410 :57-+