Towards completion of the Earth's proteome

被引:23
作者
Perez-Iratxeta, Carolina [1 ]
Palidwor, Gareth [1 ]
Andrade-Navarro, Miguel A. [1 ,2 ,3 ]
机构
[1] Ottawa Hlth Res Inst, Dept Mol Med, Ottawa, ON K1H 8L6, Canada
[2] Univ Ottawa, Fac Med, Dept Cellular & Mol Med, Ottawa, ON K1H 8L6, Canada
[3] Max Delbruck Ctr Mol Med, D-13125 Berlin, Germany
关键词
protein sequence database; genomics; sequencing project; database annotation; phylogenetic analysis;
D O I
10.1038/sj.embor.7401117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome-the complete set of proteins on Earth-is approaching completion. We estimate the approximate size of the Earth's proteome to be 5 million sequences, most of which will be identified during the next 5 years. As the Earth's proteome nears completion, cluster analysis of the protein database will become essential to identify under-explored taxa to which future sequencing efforts should be directed and to focus research on protein families without experimental characterization.
引用
收藏
页码:1135 / 1141
页数:7
相关论文
共 34 条
[31]   Database resources of the National Center for Biotechnology Information [J].
Wheeler, David L. ;
Barrett, Tanya ;
Benson, Dennis A. ;
Bryant, Stephen H. ;
Canese, Kathi ;
Chetvernin, Vyacheslav ;
Church, Deanna M. ;
DiCuccio, Michael ;
Edgar, Ron ;
Federhen, Scott ;
Geer, Lewis Y. ;
Kapustin, Yuri ;
Khovayko, Oleg ;
Landsman, David ;
Lipman, David J. ;
Madden, Thomas L. ;
Maglott, Donna R. ;
Ostell, James ;
Miller, Vadim ;
Pruitt, Kim D. ;
Schuler, Gregory D. ;
Sequeira, Edwin ;
Sherry, Steven T. ;
Sirotkin, Karl ;
Souvorov, Alexandre ;
Starchenko, Grigory ;
Tatusov, Roman L. ;
Tatusova, Tatiana A. ;
Wagner, Lukas ;
Yaschenko, Eugene .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D5-D12
[32]   Prokaryotes: The unseen majority [J].
Whitman, WB ;
Coleman, DC ;
Wiebe, WJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (12) :6578-6583
[33]   SEQUENCES WITH UNUSUAL AMINO-ACID COMPOSITIONS [J].
WOOTTON, JC .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1994, 4 (03) :413-421
[34]   The Sorcerer II Global Ocean Sampling expedition:: Expanding the universe of protein families [J].
Yooseph, Shibu ;
Sutton, Granger ;
Rusch, Douglas B. ;
Halpern, Aaron L. ;
Williamson, Shannon J. ;
Remington, Karin ;
Eisen, Jonathan A. ;
Heidelberg, Karla B. ;
Manning, Gerard ;
Li, Weizhong ;
Jaroszewski, Lukasz ;
Cieplak, Piotr ;
Miller, Christopher S. ;
Li, Huiying ;
Mashiyama, Susan T. ;
Joachimiak, Marcin P. ;
van Belle, Christopher ;
Chandonia, John-Marc ;
Soergel, David A. ;
Zhai, Yufeng ;
Natarajan, Kannan ;
Lee, Shaun ;
Raphael, Benjamin J. ;
Bafna, Vineet ;
Friedman, Robert ;
Brenner, Steven E. ;
Godzik, Adam ;
Eisenberg, David ;
Dixon, Jack E. ;
Taylor, Susan S. ;
Strausberg, Robert L. ;
Frazier, Marvin ;
Venter, J. Craig .
PLOS BIOLOGY, 2007, 5 (03) :432-466