The effect of sequencing errors on metagenomic gene prediction

被引:76
作者
Hoff, Katharina J. [1 ,2 ]
机构
[1] Univ Gottingen, Inst Microbiol & Genet, Dept Bioinformat, Gottingen, Germany
[2] Univ Gottingen, Int Max Planck Res Sch Mol Biol, Gottingen, Germany
来源
BMC GENOMICS | 2009年 / 10卷
关键词
ACCURACY;
D O I
10.1186/1471-2164-10-520
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Gene prediction is an essential step in the annotation of metagenomic sequencing reads. Since most metagenomic reads cannot be assembled into long contigs, specialized statistical gene prediction tools have been developed for short and anonymous DNA fragments, e. g. MetaGeneAnnotator and Orphelia. While conventional gene prediction methods have been subject to a benchmark study on real sequencing reads with typical errors, such a comparison has not been conducted for specialized tools, yet. Their gene prediction accuracy was mostly measured on error free DNA fragments. Results: In this study, Sanger and pyrosequencing reads were simulated on the basis of models that take all types of sequencing errors into account. All metagenomic gene prediction tools showed decreasing accuracy with increasing sequencing error rates. Performance results on an established metagenomic benchmark dataset are also reported. In addition, we demonstrate that ESTScan, a tool for sequencing error compensation in eukaryotic expressed sequence tags, outperforms some metagenomic gene prediction tools on reads with high error rates although it was not designed for the task at hand. Conclusion: This study fills an important gap in metagenomic gene prediction research. Specialized methods are evaluated and compared with respect to sequencing error robustness. Results indicate that the integration of error-compensating methods into metagenomic gene prediction tools would be beneficial to improve metagenome annotation quality.
引用
收藏
页数:9
相关论文
共 30 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   The marine viromes of four oceanic regions [J].
Angly, Florent E. ;
Felts, Ben ;
Breitbart, Mya ;
Salamon, Peter ;
Edwards, Robert A. ;
Carlson, Craig ;
Chan, Amy M. ;
Haynes, Matthew ;
Kelley, Scott ;
Liu, Hong ;
Mahaffy, Joseph M. ;
Mueller, Jennifer E. ;
Nulton, Jim ;
Olson, Robert ;
Parsons, Rachel ;
Rayhawk, Steve ;
Suttle, Curtis A. ;
Rohwer, Forest .
PLOS BIOLOGY, 2006, 4 (11) :2121-2131
[3]   Metagenomic characterization of Chesapeake bay virioplankton [J].
Bench, Shellie R. ;
Hanson, Thomas E. ;
Williamson, Kurt E. ;
Ghosh, Dhritiman ;
Radosovich, Mark ;
Wang, Kui ;
Wommack, K. Eric .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2007, 73 (23) :7629-7641
[4]   Heuristic approach to deriving models for gene finding [J].
Besemer, J ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 1999, 27 (19) :3911-3920
[5]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[6]  
Durbin R., 1998, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
[7]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[8]   Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities [J].
Gilbert, Jack A. ;
Field, Dawn ;
Huang, Ying ;
Edwards, Rob ;
Li, Weizhong ;
Gilna, Paul ;
Joint, Ian .
PLOS ONE, 2008, 3 (08)
[9]   Quantitative assessment of protein function prediction from metagenomics shotgun sequences [J].
Harrington, E. D. ;
Singh, A. H. ;
Doerks, T. ;
Letunic, I. ;
von Mering, C. ;
Jensen, L. J. ;
Raes, J. ;
Bork, P. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (35) :13913-13918
[10]   Gene prediction in metagenomic fragments: A large scale machine learning approach [J].
Hoff, Katharina J. ;
Tech, Maike ;
Lingner, Thomas ;
Daniel, Rolf ;
Morgenstern, Burkhard ;
Meinicke, Peter .
BMC BIOINFORMATICS, 2008, 9 (1)