Development of joint application strategies for two microbial gene finders

被引:59
作者
McHardy, AC
Goesmann, A
Pühler, A
Meyer, F [1 ]
机构
[1] Univ Bielefeld, Dept Biol, Ctr Biotechnol, D-33594 Bielefeld, Germany
[2] Univ Bielefeld, Dept Biol, Lehrstuhl Genet, D-33594 Bielefeld, Germany
关键词
D O I
10.1093/bioinformatics/bth137
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As a starting point in annotation of bacterial genomes, gene finding programs are used for the prediction of functional elements in the DNA sequence. Due to the faster pace and increasing number of genome projects currently underway, it is becoming especially important to have performant methods for this task. Results: This study describes the development of joint application strategies that combine the strengths of two microbial gene finders to improve the overall gene finding performance. Critica is very specific in the detection of similarity-supported genes as it uses a comparative sequence analysis-based approach. Glimmer employs a very sophisticated model of genomic sequence properties and is sensitive also in the detection of organism-specific genes. Based on a data set of 113 microbial genome sequences, we optimized a combined application approach using different parameters with relevance to the gene finding problem. This results in a significant improvement in specificity while there is similarity in sensitivity to Glimmer. The improvement is especially pronounced for GC rich genomes. The method is currently being applied for the annotation of several microbial genomes.
引用
收藏
页码:1622 / 1631
页数:10
相关论文
共 25 条
[11]   ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes [J].
Guo, FB ;
Ou, HY ;
Zhang, CT .
NUCLEIC ACIDS RESEARCH, 2003, 31 (06) :1780-1789
[12]   EasyGene - a prokaryotic gene finder that ranks ORFs by statistical significance [J].
Larsen, TS ;
Krogh, A .
BMC BIOINFORMATICS, 2003, 4 (1)
[13]   GenDB -: an open source genome annotation system for prokaryote genomes [J].
Meyer, F ;
Goesmann, A ;
McHardy, AC ;
Bartels, D ;
Bekel, T ;
Clausen, J ;
Kalinowski, J ;
Linke, B ;
Rupp, O ;
Giegerich, R ;
Pühler, A .
NUCLEIC ACIDS RESEARCH, 2003, 31 (08) :2187-2195
[14]  
MOORE JE, NUCL ACIDS RES, V31, P7271
[15]  
Olson Sue A, 2002, Brief Bioinform, V3, P87, DOI 10.1093/bib/3.1.87
[16]   The ERGO™ genome analysis and discovery system [J].
Overbeek, R ;
Larsen, N ;
Walunas, T ;
D'Souza, M ;
Pusch, G ;
Selkov, E ;
Liolios, K ;
Joukov, V ;
Kaznadzey, D ;
Anderson, I ;
Bhattacharyya, A ;
Burd, H ;
Gardner, W ;
Hanke, P ;
Kapatral, V ;
Mikhailova, N ;
Vasieva, O ;
Osterman, A ;
Vonstein, V ;
Fonstein, M ;
Ivanova, N ;
Kyrpides, N .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :164-171
[17]   Protein-coding regions prediction combining similarity searches and conservative evolutionary properties of protein-coding sequences [J].
Rogozin, IB ;
D'Angelo, D ;
Milanesi, L .
GENE, 1999, 226 (01) :129-137
[18]   Artemis: sequence visualization and annotation [J].
Rutherford, K ;
Parkhill, J ;
Crook, J ;
Horsnell, T ;
Rice, P ;
Rajandream, MA ;
Barrell, B .
BIOINFORMATICS, 2000, 16 (10) :944-945
[19]   Microbial gene identification using interpolated Markov models [J].
Salzberg, SL ;
Delcher, AL ;
Kasif, S ;
White, O .
NUCLEIC ACIDS RESEARCH, 1998, 26 (02) :544-548
[20]   Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements [J].
Schäffer, AA ;
Aravind, L ;
Madden, TL ;
Shavirin, S ;
Spouge, JL ;
Wolf, YI ;
Koonin, EV ;
Altschul, SF .
NUCLEIC ACIDS RESEARCH, 2001, 29 (14) :2994-3005