Development of joint application strategies for two microbial gene finders

被引:59
作者
McHardy, AC
Goesmann, A
Pühler, A
Meyer, F [1 ]
机构
[1] Univ Bielefeld, Dept Biol, Ctr Biotechnol, D-33594 Bielefeld, Germany
[2] Univ Bielefeld, Dept Biol, Lehrstuhl Genet, D-33594 Bielefeld, Germany
关键词
D O I
10.1093/bioinformatics/bth137
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As a starting point in annotation of bacterial genomes, gene finding programs are used for the prediction of functional elements in the DNA sequence. Due to the faster pace and increasing number of genome projects currently underway, it is becoming especially important to have performant methods for this task. Results: This study describes the development of joint application strategies that combine the strengths of two microbial gene finders to improve the overall gene finding performance. Critica is very specific in the detection of similarity-supported genes as it uses a comparative sequence analysis-based approach. Glimmer employs a very sophisticated model of genomic sequence properties and is sensitive also in the detection of organism-specific genes. Based on a data set of 113 microbial genome sequences, we optimized a combined application approach using different parameters with relevance to the gene finding problem. This results in a significant improvement in specificity while there is similarity in sensitivity to Glimmer. The improvement is especially pronounced for GC rich genomes. The method is currently being applied for the annotation of several microbial genomes.
引用
收藏
页码:1622 / 1631
页数:10
相关论文
共 25 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   CRITICA: Coding region identification tool invoking comparative analysis [J].
Badger, JH ;
Olsen, GJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (04) :512-524
[3]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[4]   Heuristic approach to deriving models for gene finding [J].
Besemer, J ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 1999, 27 (19) :3911-3920
[5]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[6]   Massive gene decay in the leprosy bacillus [J].
Cole, ST ;
Eiglmeier, K ;
Parkhill, J ;
James, KD ;
Thomson, NR ;
Wheeler, PR ;
Honoré, N ;
Garnier, T ;
Churcher, C ;
Harris, D ;
Mungall, K ;
Basham, D ;
Brown, D ;
Chillingworth, T ;
Connor, R ;
Davies, RM ;
Devlin, K ;
Duthoy, S ;
Feltwell, T ;
Fraser, A ;
Hamlin, N ;
Holroyd, S ;
Hornsby, T ;
Jagels, K ;
Lacroix, C ;
Maclean, J ;
Moule, S ;
Murphy, L ;
Oliver, K ;
Quail, MA ;
Rajandream, MA ;
Rutherford, KM ;
Rutter, S ;
Seeger, K ;
Simon, S ;
Simmonds, M ;
Skelton, J ;
Squares, R ;
Squares, S ;
Stevens, K ;
Taylor, K ;
Whitehead, S ;
Woodward, JR ;
Barrell, BG .
NATURE, 2001, 409 (6823) :1007-1011
[7]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[8]   Combining diverse evidence for gene recognition in completely sequenced bacterial genomes [J].
Frishman, D ;
Mironov, A ;
Mewes, HW ;
Gelfand, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (12) :2941-2947
[9]   MAGPIE: Automated genome interpretation [J].
Gaasterland, T ;
Sensen, CW .
TRENDS IN GENETICS, 1996, 12 (02) :76-78
[10]   Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching [J].
Gribskov, M ;
Robinson, NL .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :25-33