MetaSim-A Sequencing Simulator for Genomics and Metagenomics

被引:288
作者
Richter, Daniel C. [1 ]
Ott, Felix [1 ]
Auch, Alexander F. [1 ]
Schmid, Ramona [2 ]
Huson, Daniel H. [1 ]
机构
[1] Univ Tubingen, ZBIT Ctr Bioinformat Tubingen, Tubingen, Germany
[2] Boehringer Ingelheim Pharma GmbH & Co KG, Biberach, Germany
来源
PLOS ONE | 2008年 / 3卷 / 10期
关键词
D O I
10.1371/journal.pone.0003373
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets. Methodology/Principal Findings: To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree. Conclusions/Significance: MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.
引用
收藏
页数:12
相关论文
共 37 条
[1]   GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D34-D38
[2]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[3]   Genomes OnLine Database (GOLD): a monitor of genome projects world-wide [J].
Bernal, A ;
Ear, U ;
Kyrpides, N .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :126-127
[4]  
Chatterji S, 2008, LECT N BIOINFORMAT, V4955, P17
[5]   Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences [J].
Deschavanne, PJ ;
Giron, A ;
Vilain, J ;
Fagot, G ;
Fertil, B .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (10) :1391-1399
[6]  
ENGLE ML, 1994, COMPUT APPL BIOSCI, V10, P567
[7]   The Pfam protein families database [J].
Finn, Robert D. ;
Tate, John ;
Mistry, Jaina ;
Coggill, Penny C. ;
Sammut, Stephen John ;
Hotz, Hans-Rudolf ;
Ceric, Goran ;
Forslund, Kristoffer ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D281-D288
[8]   Metagenomic analysis of the human distal gut microbiome [J].
Gill, Steven R. ;
Pop, Mihai ;
DeBoy, Robert T. ;
Eckburg, Paul B. ;
Turnbaugh, Peter J. ;
Samuel, Buck S. ;
Gordon, Jeffrey I. ;
Relman, David A. ;
Fraser-Liggett, Claire M. ;
Nelson, Karen E. .
SCIENCE, 2006, 312 (5778) :1355-1359
[9]   The TIGRFAMs database of protein families [J].
Haft, DH ;
Selengut, JD ;
White, O .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :371-373
[10]  
Harding E.F., 1971, Adv. Appl. Prob, V3, P44, DOI DOI 10.2307/1426329