Large-Scale Comparative Genomic Ranking of Taxonomically Restricted Genes (TRGs) in Bacterial and Archaeal Genomes

被引:21
作者
Wilson, Gareth A. [1 ]
Feil, Edward J. [2 ]
Lilley, Andrew K. [1 ]
Field, Dawn [1 ]
机构
[1] CEH, Oxford, England
[2] Univ Bath, Dept Biol & Biochem, Bath BA2 7AY, Avon, England
来源
PLOS ONE | 2007年 / 2卷 / 03期
基金
英国自然环境研究理事会;
关键词
D O I
10.1371/journal.pone.0000324
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background. Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. Methodology/Principal Findings. We have developed "QIPP'' ("Quality Index for Predicted Proteins''), an index that scores the "quality'' of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic'' (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. Conclusions. The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.
引用
收藏
页数:10
相关论文
共 52 条
[1]   Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis [J].
Akashi, H ;
Gojobori, T .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (06) :3695-3700
[2]   Reverse transcriptase-polymerase chain reaction validation of 25 "orphan" genes from Escherichia coli K-12 MG1655 [J].
Alimi, JP ;
Poirot, O ;
Lopez, F ;
Claverie, JM .
GENOME RESEARCH, 2000, 10 (07) :959-966
[3]   The MicrobesOnline web site for comparative genomics [J].
Alm, EJ ;
Huang, KH ;
Price, MN ;
Koche, RP ;
Keller, K ;
Dubchak, IL ;
Arkin, AP .
GENOME RESEARCH, 2005, 15 (07) :1015-1022
[4]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[5]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[6]   Birth and death of orphan genes in Rickettsia [J].
Amiri, H ;
Davids, W ;
Andersson, SGE .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (10) :1575-1587
[7]   Computing prokaryotic gene ubiquity: Rescuing the core from extinction [J].
Charlebois, RL ;
Doolittle, WF .
GENOME RESEARCH, 2004, 14 (12) :2469-2477
[8]   Toward automatic reconstruction of a highly resolved tree of life [J].
Ciccarelli, FD ;
Doerks, T ;
von Mering, C ;
Creevey, CJ ;
Snel, B ;
Bork, P .
SCIENCE, 2006, 311 (5765) :1283-1287
[9]   Toward a protein profile of Escherichia coli:: Comparison to its transcription profile [J].
Corbin, RW ;
Paliy, O ;
Yang, F ;
Shabanowitz, J ;
Platt, M ;
Lyons, CE ;
Root, K ;
McAuliffe, J ;
Jordan, MI ;
Kustu, S ;
Soupene, E ;
Hunt, DF .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (16) :9232-9237
[10]  
Courcelle J, 2001, GENETICS, V158, P41