pIRS: Profile-based Illumina pair-end reads simulator

被引:133
作者
Hu, Xuesong [1 ,2 ]
Yuan, Jianying [1 ]
Shi, Yujian [1 ]
Lu, Jianliang [1 ]
Liu, Binghang [1 ]
Li, Zhenyu [1 ]
Chen, Yanxiang [1 ]
Mu, Desheng [1 ]
Zhang, Hao [1 ]
Li, Nan [1 ]
Yue, Zhen [1 ]
Bai, Fan [2 ]
Li, Heng [3 ]
Fan, Wei [1 ,2 ]
机构
[1] BGI Shenzhen, Shenzhen 518083, Peoples R China
[2] Peking Univ, Biodynam Opt Imaging Ctr, Beijing 100871, Peoples R China
[3] Broad Inst, Med Populat Genet Program, Cambridge, MA 02142 USA
关键词
ALIGNMENT;
D O I
10.1093/bioinformatics/bts187
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The next-generation high-throughput sequencing technologies, especially from Illumina, have been widely used in re-sequencing and de novo assembly studies. However, there is no existing software that can simulate Illumina reads with real error and quality distributions and coverage bias yet, which is very useful in relevant software development and study designing of sequencing projects. Results: We provide a software package, pIRS (profile-based Illumina pair-end reads simulator), which simulates Illumina reads with empirical Base-Calling and GC%-depth profiles trained from real re-sequencing data. The error and quality distributions as well as coverage bias patterns of simulated reads using pIRS fit the properties of real sequencing data better than existing simulators. In addition, pIRS also comes with a tool to simulate the heterozygous diploid genomes.
引用
收藏
页码:1533 / 1535
页数:3
相关论文
共 9 条
[1]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[2]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[3]   ART: a next-generation sequencing read simulator [J].
Huang, Weichun ;
Li, Leping ;
Myers, Jason R. ;
Marth, Gabor T. .
BIOINFORMATICS, 2012, 28 (04) :593-594
[4]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[5]   Fast and accurate long-read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2010, 26 (05) :589-595
[6]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[7]   SOAP2: an improved ultrafast tool for short read alignment [J].
Li, Ruiqiang ;
Yu, Chang ;
Li, Yingrui ;
Lam, Tak-Wah ;
Yiu, Siu-Ming ;
Kristiansen, Karsten ;
Wang, Jun .
BIOINFORMATICS, 2009, 25 (15) :1966-1967
[8]   Sequence-specific error profile of Illumina sequencers [J].
Nakamura, Kensuke ;
Oshima, Taku ;
Morimoto, Takuya ;
Ikeda, Shun ;
Yoshikawa, Hirofumi ;
Shiwa, Yuh ;
Ishikawa, Shu ;
Linak, Margaret C. ;
Hirai, Aki ;
Takahashi, Hiroki ;
Altaf-Ul-Amin, Md. ;
Ogasawara, Naotake ;
Kanaya, Shigehiko .
NUCLEIC ACIDS RESEARCH, 2011, 39 (13) :e90
[9]   MetaSim-A Sequencing Simulator for Genomics and Metagenomics [J].
Richter, Daniel C. ;
Ott, Felix ;
Auch, Alexander F. ;
Schmid, Ramona ;
Huson, Daniel H. .
PLOS ONE, 2008, 3 (10)