Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

被引:294
作者
Waack, Stephan
Keller, Oliver
Asper, Roman
Brodag, Thomas
Damm, Carsten
Fricke, Wolfgang Florian
Surovcik, Katharina
Meinicke, Peter
Merkl, Rainer
机构
[1] Univ Regensburg, Inst Biophys & Phys Biochem, D-93053 Regensburg, Germany
[2] Univ Gottingen, Inst Informat, D-37083 Gottingen, Germany
[3] Univ Gottingen, Inst Numer & Angew Math, D-37083 Gottingen, Germany
[4] Univ Gottingen, Gottingen Genom Lab, D-37077 Gottingen, Germany
[5] Univ Gottingen, Inst Mikrobiol & Genet, D-37077 Gottingen, Germany
关键词
D O I
10.1186/1471-2105-7-142
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Horizontal gene transfer (HGT) is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs) or more specifically pathogenicity or symbiotic islands. Results: We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU) of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion: SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired genes.
引用
收藏
页数:12
相关论文
共 61 条
[1]  
[Anonymous], 1997, THESIS STANFORD U
[2]   Use of artificial genomes in assessing methods for atypical gene detection [J].
Azad, RK ;
Lawrence, JG .
PLOS COMPUTATIONAL BIOLOGY, 2005, 1 (06) :461-473
[3]   GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :15-18
[4]   Comparative genomic structure of prokaryotes [J].
Bentley, SD ;
Parkhill, J .
ANNUAL REVIEW OF GENETICS, 2004, 38 :771-792
[5]   The complete genome sequence of Propionibacterium acnes, a commensal of human skin [J].
Brüggemann, H ;
Henne, A ;
Hoster, F ;
Liesegang, H ;
Wiezer, A ;
Strittmatter, A ;
Hujer, S ;
Dürre, P ;
Gottschalk, G .
SCIENCE, 2004, 305 (5684) :671-673
[6]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]   Prophages and bacterial genomics: what have we learned so far? [J].
Casjens, S .
MOLECULAR MICROBIOLOGY, 2003, 49 (02) :277-300
[8]   Hairpin telomeres and genome plasticity in Borrelia:: all mixed up in the end [J].
Chaconas, G .
MOLECULAR MICROBIOLOGY, 2005, 58 (03) :625-635
[9]   The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number [J].
Collins, NE ;
Liebenberg, J ;
de Villiers, EP ;
Brayton, KA ;
Louw, E ;
Pretorius, A ;
Faber, FE ;
van Heerden, H ;
Josemans, A ;
van Kleef, M ;
Steyn, HC ;
van Strijp, MF ;
Zweygarth, E ;
Jongejan, F ;
Maillard, JC ;
Berthier, D ;
Botha, M ;
Joubert, F ;
Corton, CH ;
Thomson, NR ;
Allsopp, MT ;
Allsopp, BA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (03) :838-843
[10]   G+C3 structuring along the genome:: A common feature in prokaryotes [J].
Daubin, V ;
Perrière, G .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (04) :471-483