SIGI: score-based identification of genomic islands

被引:62
作者
Merkl, R
机构
[1] Univ Gottingen, Inst Mikrobiol & Genet, Abt Mol Genet & Praparat Mol Biol, D-37077 Gottingen, Germany
[2] Gottingen Genom Lab, D-37077 Gottingen, Germany
关键词
D O I
10.1186/1471-2105-5-22
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands. Results: A scoring scheme on codon frequencies Score_G1G2(cdn) = log(f_G2(cdn)/f_G1(cdn)) was utilized. To analyse genes of a species G1 and to test their relatedness to species G2, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position G2. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration. Conclusions: The method reliably allows to identify genomic island and the likely origin of alien genes.
引用
收藏
页数:14
相关论文
共 47 条
[41]   Operons in Escherichia coli:: Genomic analyses and predictions [J].
Salgado, H ;
Moreno-Hagelsieb, G ;
Smith, TF ;
Collado-Vides, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (12) :6652-6657
[42]   Quantifying the species-specificity in genomics signatures, synonymous codon choice, amino acid usage and G+C content [J].
Sandberg, R ;
Bränden, CI ;
Ernberg, I ;
Cöster, J .
GENE, 2003, 311 :35-42
[43]   Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier [J].
Sandberg, R ;
Winberg, G ;
Bränden, CI ;
Kaske, A ;
Ernberg, I ;
Cöster, J .
GENOME RESEARCH, 2001, 11 (08) :1404-1409
[44]   THE CODON ADAPTATION INDEX - A MEASURE OF DIRECTIONAL SYNONYMOUS CODON USAGE BIAS, AND ITS POTENTIAL APPLICATIONS [J].
SHARP, PM ;
LI, WH .
NUCLEIC ACIDS RESEARCH, 1987, 15 (03) :1281-1295
[45]   Salmonella enterica serovar Typhi possesses a unique repertoire of fimbrial gene sequences [J].
Townsend, SM ;
Kramer, NE ;
Edwards, R ;
Baker, S ;
Hamlin, N ;
Simmonds, M ;
Stevens, K ;
Maloy, S ;
Parkhill, J ;
Dougan, G ;
Bäumler, AJ .
INFECTION AND IMMUNITY, 2001, 69 (05) :2894-2901
[46]   Limitations of compositional approach to identifying horizontally transferred genes [J].
Wang, B .
JOURNAL OF MOLECULAR EVOLUTION, 2001, 53 (03) :244-250
[47]  
Wolf YI, 1999, GENOME RES, V9, P689