Operon prediction for sequenced bacterial Genomes without experimental information

被引:29
作者
Bergman, Nicholas H.
Passalacqua, Karla D.
Hanna, Philip C.
Qin, Zhaohui S.
机构
[1] Univ Michigan, Sch Med, Bioinformat Program, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Sch Med, Dept Microbiol & Immunol, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Sch Publ Hlth, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[4] Univ Michigan, Sch Publ Hlth, Dept Biostat, Ann Arbor, MI 48109 USA
关键词
D O I
10.1128/AEM.01686-06
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Various computational approaches have been proposed for operon prediction, but most algorithms rely on experimental or functional data that are only available for a small subset of sequenced genomes. In this study, we explored the possibility of using phylogenetic information to aid in operon prediction, and we constructed a Bayesian hidden Markov model that incorporates comparative genomic data with traditional predictors, such as intergenic distances. The prediction algorithm performs as well as the best previously reported method, with several significant advantages. It uses fewer data sources and so it is easier to implement, and the method is more broadly applicable than previous methods-it can be applied to essentially every gene in any sequenced bacterial genome. Furthermore, we show that near-optimal performance is easily reached with a generic set of comparative genomes and does not depend on a specific relationship between the subject genome and the comparative set. We applied the algorithm to the Bacillus anthracis genome and found that it successfully predicted all previously verified B. anthracis operons. To further test its performance, we chose a predicted operon (BA1489-92) containing several genes with little apparent functional relatedness and tested their cotranscriptional nature. Experimental evidence shows that these genes are cotranscribed, and the data have interesting implications for B. anthracis biology. Overall, our findings show that this algorithm is capable of highly sensitive and accurate operon prediction in a wide range of bacterial genomes and that these predictions can lead to the rapid discovery of new functional relationships among genes.
引用
收藏
页码:846 / 854
页数:9
相关论文
共 36 条
[1]  
Allen JE, 2004, GENOME RES, V14, P142, DOI 10.1101/gr.1562804
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   A Bayesian network approach to operon prediction [J].
Bockhorst, J ;
Craven, M ;
Page, D ;
Shavlik, J ;
Glasner, J .
BIOINFORMATICS, 2003, 19 (10) :1227-1235
[4]   Predicting bacterial transcription units using sequence and expression data [J].
Bockhorst, Joseph ;
Qiu, Yu ;
Glasner, Jeremy ;
Liu, Mingzhu ;
Blattner, Frederick ;
Craven, Mark .
BIOINFORMATICS, 2003, 19 :i34-i43
[5]   Operon prediction by comparative genomics:: an application to the Synechococcus sp WH8102 genome [J].
Chen, X ;
Su, Z ;
Dam, P ;
Palenik, B ;
Xu, Y ;
Jiang, T .
NUCLEIC ACIDS RESEARCH, 2004, 32 (07) :2147-2157
[6]  
Craven M, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P116
[7]  
De Hoon MJL, 2003, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, P276
[8]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[9]   Prediction of operons in microbial genomes [J].
Ermolaeva, MD ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2001, 29 (05) :1216-1221
[10]   Modularity in the gain and loss of genes: applications for function prediction [J].
Ettema, T ;
van der Oost, J ;
Huynen, M .
TRENDS IN GENETICS, 2001, 17 (09) :485-487