FragGeneScan: predicting genes in short and error-prone reads

被引:551
作者
Rho, Mina [1 ]
Tang, Haixu [1 ,2 ]
Ye, Yuzhen [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47408 USA
[2] Indiana Univ, Ctr Genom & Bioinformat, Bloomington, IN 47405 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
IDENTIFICATION; METAGENOMICS; MICROBIOME; GENOMICS;
D O I
10.1093/nar/gkq747
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved similar to 62% for reads of 400 bases with 1% sequencing errors, and similar to 18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (> 90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.
引用
收藏
页数:12
相关论文
共 44 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   PHYLOGENETIC IDENTIFICATION AND IN-SITU DETECTION OF INDIVIDUAL MICROBIAL-CELLS WITHOUT CULTIVATION [J].
AMANN, RI ;
LUDWIG, W ;
SCHLEIFER, KH .
MICROBIOLOGICAL REVIEWS, 1995, 59 (01) :143-169
[3]   The RAST server: Rapid annotations using subsystems technology [J].
Aziz, Ramy K. ;
Bartels, Daniela ;
Best, Aaron A. ;
DeJongh, Matthew ;
Disz, Terrence ;
Edwards, Robert A. ;
Formsma, Kevin ;
Gerdes, Svetlana ;
Glass, Elizabeth M. ;
Kubal, Michael ;
Meyer, Folker ;
Olsen, Gary J. ;
Olson, Robert ;
Osterman, Andrei L. ;
Overbeek, Ross A. ;
McNeil, Leslie K. ;
Paarmann, Daniel ;
Paczian, Tobias ;
Parrello, Bruce ;
Pusch, Gordon D. ;
Reich, Claudia ;
Stevens, Rick ;
Vassieva, Olga ;
Vonstein, Veronika ;
Wilke, Andreas ;
Zagnitko, Olga .
BMC GENOMICS, 2008, 9 (1)
[4]  
DAVIDSEN T, 2001, NUCLEIC ACIDS RES, V38, pD340
[5]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[6]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[7]   Metagenomic analysis of the human distal gut microbiome [J].
Gill, Steven R. ;
Pop, Mihai ;
DeBoy, Robert T. ;
Eckburg, Paul B. ;
Turnbaugh, Peter J. ;
Samuel, Buck S. ;
Gordon, Jeffrey I. ;
Relman, David A. ;
Fraser-Liggett, Claire M. ;
Nelson, Karen E. .
SCIENCE, 2006, 312 (5778) :1355-1359
[8]   Metagenomics: Application of genomics to uncultured microorganisms [J].
Handelsman, J .
MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, 2004, 68 (04) :669-+
[9]   The Human Intestinal Microbiome: A New Frontier of Human Biology [J].
Hattori, Masahira ;
Taylor, Todd D. .
DNA RESEARCH, 2009, 16 (01) :1-12
[10]   Gene prediction in metagenomic fragments: A large scale machine learning approach [J].
Hoff, Katharina J. ;
Tech, Maike ;
Lingner, Thomas ;
Daniel, Rolf ;
Morgenstern, Burkhard ;
Meinicke, Peter .
BMC BIOINFORMATICS, 2008, 9 (1)