Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing

被引:32
作者
Coarfa, Cristian [1 ]
Yu, Fuli [1 ,2 ]
Miller, Christopher A. [1 ]
Chen, Zuozhou [1 ]
Harris, R. Alan [1 ]
Milosavljevic, Aleksandar [1 ]
机构
[1] Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USA
[2] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
美国国家卫生研究院;
关键词
METHYLATION; ALIGNMENT; IDENTIFICATION; GENERATION; DATABASE; PROTEIN; BLAST;
D O I
10.1186/1471-2105-11-572
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing. Results: Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms. Conclusions: We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.
引用
收藏
页数:11
相关论文
共 34 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The many faces of sequence alignment [J].
Batzoglou, S .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :6-22
[3]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[4]   Quantitative comparison of genome-wide DNA methylation mapping technologies [J].
Bock, Christoph ;
Tomazou, Eleni M. ;
Brinkman, Arie B. ;
Mueller, Fabian ;
Simmer, Femke ;
Gu, Hongcang ;
Jaeger, Natalie ;
Gnirke, Andreas ;
Stunnenberg, Hendrik G. ;
Meissner, Alexander .
NATURE BIOTECHNOLOGY, 2010, 28 (10) :1106-U196
[5]   Good spaced seeds for homology search [J].
Choi, KP ;
Zeng, FF ;
Zhang, LX .
BIOINFORMATICS, 2004, 20 (07) :1053-1059
[6]  
Coarfa Cristian, 2008, Pac Symp Biocomput, P102
[7]   Origins and functional impact of copy number variation in the human genome [J].
Conrad, Donald F. ;
Pinto, Dalila ;
Redon, Richard ;
Feuk, Lars ;
Gokcumen, Omer ;
Zhang, Yujun ;
Aerts, Jan ;
Andrews, T. Daniel ;
Barnes, Chris ;
Campbell, Peter ;
Fitzgerald, Tomas ;
Hu, Min ;
Ihm, Chun Hwa ;
Kristiansson, Kati ;
MacArthur, Daniel G. ;
MacDonald, Jeffrey R. ;
Onyiah, Ifejinelo ;
Pang, Andy Wing Chun ;
Robson, Sam ;
Stirrups, Kathy ;
Valsesia, Armand ;
Walter, Klaudia ;
Wei, John ;
Tyler-Smith, Chris ;
Carter, Nigel P. ;
Lee, Charles ;
Scherer, Stephen W. ;
Hurles, Matthew E. .
NATURE, 2010, 464 (7289) :704-712
[8]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[9]   mrsFAST: a cache-oblivious algorithm for short-read mapping [J].
Hach, Faraz ;
Hormozdiari, Fereydoun ;
Alkan, Can ;
Hormozdiari, Farhad ;
Birol, Inanc ;
Eichler, Evan E. ;
Sahinalp, S. Cenk .
NATURE METHODS, 2010, 7 (08) :576-577
[10]   BRAT: bisulfite-treated reads analysis tool [J].
Harris, Elena Y. ;
Ponts, Nadia ;
Levchuk, Aleksandr ;
Le Roch, Karine ;
Lonardi, Stefano .
BIOINFORMATICS, 2010, 26 (04) :572-573