Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context

被引:259
作者
Wolf, YI [1 ]
Rogozin, IB [1 ]
Kondrashov, AS [1 ]
Koonin, EV [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
关键词
D O I
10.1101/gr.GR-1619R
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and archaeal genomes. Nevertheless, even the limited conservation of operon organization that is observed call provide valuable evolutionary and functional clues through multiple genome comparisons. A program for constructing gapped local alignments of conserved gene strings in two genomes was developed. The statistical significance of the local alignments was assessed using Monte Carlo simulations. Sets of local alignments were generated for all pairs of completely sequenced bacterial and archaeal genomes, and for each genome a template-anchored multiple alignment was constructed. In most pairwise genome comparisons, <10% of the genes in each genome belonged to conserved gene strings. When closely related pairs of species (i.e., two mycoplasmas) are excluded, the total coverage of genomes by conserved gene strings ranged from <5% for the cyanobacterium Synechocystis sp to 24% for the minimal genome of Mycoplasma genitalium, and 23% in Thermotoga maritime. The coverage of the archaeal genomes was only slightly lower than that of bacterial genomes. The majority of the conserved gene strings are known operons, with the ribosomal superoperon being the top-scoring string in most genome comparisons. However, in some of the bacterial-archaeal pairs, the superoperon is rearranged to the extent that other operons, primarily those subject to horizontal transfer, show the greatest level of conservation, such as the archaeal-type H+-ATPase operon or ABC-type transport cassettes. The level of gene order conservation among prokaryotic genomes was compared to the cooccurrence of genomes in clusters of orthologous genes (COGs) and to the conservation of protein sequences themselves. Only limited correlation was observed between these evolutionary variables. Gene order conservation shows a much lower variance than the cooccurrence of genomes in COGs, which indicates that intragenome homogenization via recombination occurs in evolution much faster than intergenome homogenization via horizontal gene transfer and lineage-specific gene loss. The potential of using template-anchored multiple-genome alignments for predicting functions of uncharacterized genes was quantitatively assessed. Functions were predicted or significantly clarified for similar to 90 COGs (similar to4% of the total of 2414 analyzed COGs). The most significant predictions were obtained for the poorly characterized archaeal genomes; these include a previously uncharacterized restriction-modification system, a nuclease-helicase combination implicated in DNA repair, and the probable archaeal counterpart of the eukaryotic exosome. Multiple genome alignments are a resource for studies on operon rearrangement and disruption, which is central to our understanding of the evolution of prokaryotic genomes. Because of the rapid evolution of the gene order, the potential of genome alignment for prediction of gene functions is limited, but nevertheless, such predictions information significantly complements the results obtained through protein sequence and structure analysis.
引用
收藏
页码:356 / 372
页数:17
相关论文
共 45 条
  • [11] On the origin of operons and their possible role in evolution toward thermophily
    Glansdorff, N
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1999, 49 (04) : 432 - 438
  • [12] From complete genomes to measures of substitution rate variability within and between proteins
    Grishin, NV
    Wolf, YI
    Koonin, EV
    [J]. GENOME RESEARCH, 2000, 10 (07) : 991 - 1000
  • [13] Amino acid substitution matrices
    Henikoff, S
    Henikoff, JG
    [J]. ADVANCES IN PROTEIN CHEMISTRY, VOL 54: ANALYSIS OF AMINO ACID SEQUENCES, 2000, 54 : 73 - 97
  • [14] Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium
    Himmelreich, R
    Plagens, H
    Hilbert, H
    Reiner, B
    Herrmann, R
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (04) : 701 - 712
  • [15] Predicting protein function by genomic context: Quantitative evaluation and qualitative inferences
    Huynen, M
    Snel, B
    Lathe, W
    Bork, P
    [J]. GENOME RESEARCH, 2000, 10 (08) : 1204 - 1210
  • [16] Exploitation of gene context
    Huynen, M
    Snel, B
    Lathe, W
    Bork, P
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2000, 10 (03) : 366 - 370
  • [17] Gene and context: Integrative approaches to genome analysis
    Huynen, MA
    Snel, B
    [J]. ADVANCES IN PROTEIN CHEMISTRY, VOL 54: ANALYSIS OF AMINO ACID SEQUENCES, 2000, 54 : 345 - 379
  • [18] Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes
    Itoh, T
    Takemoto, K
    Mori, H
    Gojobori, T
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (03) : 332 - 346
  • [19] JACOB F, 1960, CR HEBD ACAD SCI, V250, P1727
  • [20] Sequencing and analysis of bacterial genomes
    Koonin, EV
    Mushegian, AR
    Rudd, KE
    [J]. CURRENT BIOLOGY, 1996, 6 (04) : 404 - 416