Towards completion of the Earth's proteome

被引:23
作者
Perez-Iratxeta, Carolina [1 ]
Palidwor, Gareth [1 ]
Andrade-Navarro, Miguel A. [1 ,2 ,3 ]
机构
[1] Ottawa Hlth Res Inst, Dept Mol Med, Ottawa, ON K1H 8L6, Canada
[2] Univ Ottawa, Fac Med, Dept Cellular & Mol Med, Ottawa, ON K1H 8L6, Canada
[3] Max Delbruck Ctr Mol Med, D-13125 Berlin, Germany
关键词
protein sequence database; genomics; sequencing project; database annotation; phylogenetic analysis;
D O I
10.1038/sj.embor.7401117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome-the complete set of proteins on Earth-is approaching completion. We estimate the approximate size of the Earth's proteome to be 5 million sequences, most of which will be identified during the next 5 years. As the Earth's proteome nears completion, cluster analysis of the protein database will become essential to identify under-explored taxa to which future sequencing efforts should be directed and to focus research on protein families without experimental characterization.
引用
收藏
页码:1135 / 1141
页数:7
相关论文
共 34 条
[1]   Chemical strategies for functional proteomics [J].
Adam, GC ;
Sorensen, EJ ;
Cravatt, BF .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (10) :781-790
[2]   SEQUENCE IDENTIFICATION OF 2,375 HUMAN BRAIN GENES [J].
ADAMS, MD ;
DUBNICK, M ;
KERLAVAGE, AR ;
MORENO, R ;
KELLEY, JM ;
UTTERBACK, TR ;
NAGLE, JW ;
FIELDS, C ;
VENTER, JC .
NATURE, 1992, 355 (6361) :632-634
[3]   THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK [J].
BAIROCH, A ;
BOECKMANN, B .
NUCLEIC ACIDS RESEARCH, 1991, 19 :2247-2248
[4]   The universal protein resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Dobrokhotov, Pavel ;
Dornevil, Dolnide ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Ioannidis, Vassilios ;
Ivanyi, Ivan ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente ;
Lemercier, Philippe ;
Le Saux, Virginie .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D193-D197
[5]   CHALLENGING TIMES FOR BIOINFORMATICS [J].
CASARI, G ;
ANDRADE, MA ;
BORK, P ;
BOYLE, J ;
DARUVAR, A ;
OUZOUNIS, C ;
SCHNEIDER, R ;
TAMAMES, J ;
VALENCIA, A ;
SANDER, C .
NATURE, 1995, 376 (6542) :647-648
[6]   Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251
[7]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[8]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[9]   THE MINIMAL GENE COMPLEMENT OF MYCOPLASMA-GENITALIUM [J].
FRASER, CM ;
GOCAYNE, JD ;
WHITE, O ;
ADAMS, MD ;
CLAYTON, RA ;
FLEISCHMANN, RD ;
BULT, CJ ;
KERLAVAGE, AR ;
SUTTON, G ;
KELLEY, JM ;
FRITCHMAN, JL ;
WEIDMAN, JF ;
SMALL, KV ;
SANDUSKY, M ;
FUHRMANN, J ;
NGUYEN, D ;
UTTERBACK, TR ;
SAUDEK, DM ;
PHILLIPS, CA ;
MERRICK, JM ;
TOMB, JF ;
DOUGHERTY, BA ;
BOTT, KF ;
HU, PC ;
LUCIER, TS ;
PETERSON, SN ;
SMITH, HO ;
HUTCHISON, CA ;
VENTER, JC .
SCIENCE, 1995, 270 (5235) :397-403
[10]   Structural genomics taking shape [J].
Gaasterland, T .
TRENDS IN GENETICS, 1998, 14 (04) :135-135