Using functional and organizational information to improve genome-wide computational prediction of transcription units on pathway-genome databases

被引:61
作者
Romero, PR [1 ]
Karp, PD [1 ]
机构
[1] SRI Int, Ctr Artificial Intelligence, Bioinformat Res Grp, Menlo Pk, CA USA
关键词
D O I
10.1093/bioinformatics/btg471
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The prediction of transcription units (TUs, which are similar to operons) is an important problem that has been tackled using many different approaches. The availability of complete microbial genomes has made genome-wide TU predictions possible. Pathway-genome databases (PGDBs) add metabolic and other organizational (i.e. protein complexes) information to the annotated genome, and are able to capture TU organization information. These characteristics of PGDBs make them a suitable framework for the development and implementation of TU predictors. Results: We implemented a TU predictor that uses only intergenic distance and functional classification of genes to predict TU boundaries, and applied it to EcoCyc, our PGDB of Escherichia coli. To this original predictor, we added information on metabolic pathways, protein complexes and transporters, all readily available in EcoCyc, in order to generate an enhanced predictor. The enhanced predictor correctly predicted 80% of the known E.coli TUs (69% of the known operons), a moderate improvement over the original predictor's performance (75% of TUs and 65% of operons correctly predicted), demonstrating that the extra information available in the PGDB does indeed improve prediction performance. Performance of this E.coli-based predictor on a genome other than that of E.coli was tested on BsubCyc, our computationally generated PGDB for Bacillus subtilis, for which a set of 100 known operons is available. Prediction accuracy decreased substantially (46% of the known operons correctly predicted). This was due in part to missing information in BsubCyc, which prevented full use of the predictor's features. The augmented predictor has been implemented as part of our Pathway Tools software suite, and can be used to populate a PGDB with predicted TUs.
引用
收藏
页码:709 / U342
页数:35
相关论文
共 16 条
[1]  
Craven M, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P116
[2]   Prediction of operons in microbial genomes [J].
Ermolaeva, MD ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2001, 29 (05) :1216-1221
[3]   Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes [J].
Itoh, T ;
Takemoto, K ;
Mori, H ;
Gojobori, T .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (03) :332-346
[4]   The EcoCyc database [J].
Karp, PD ;
Riley, M ;
Saier, M ;
Paulsen, IT ;
Collado-Vides, J ;
Paley, SM ;
Pellegrini-Toole, A ;
Bonavides, C ;
Gama-Castro, S .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :56-58
[5]  
Moreno-Hagelsieb Gabriel, 2002, Bioinformatics, V18 Suppl 1, pS329
[6]   A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters [J].
Ogata, H ;
Fujibuchi, W ;
Goto, S ;
Kanehisa, M .
NUCLEIC ACIDS RESEARCH, 2000, 28 (20) :4021-4028
[7]   Evaluation of computational metabolic-pathway predictions for Helicobacter pylori [J].
Paley, SM ;
Karp, PD .
BIOINFORMATICS, 2002, 18 (05) :715-724
[8]   FUNCTIONS OF THE GENE-PRODUCTS OF ESCHERICHIA-COLI [J].
RILEY, M .
MICROBIOLOGICAL REVIEWS, 1993, 57 (04) :862-952
[9]  
Riley M., 1996, ESCHERICHIA COLI SAL, P2118
[10]   Co-expression pattern from DNA microarray experiments as a tool for operon prediction [J].
Sabatti, C ;
Rohlin, L ;
Oh, MK ;
Liao, JC .
NUCLEIC ACIDS RESEARCH, 2002, 30 (13) :2886-2893