Pipeliner: software to evaluate the performance of bioinformatics pipelines for next-generation resequencing

被引:8
作者
Nevado, B. [1 ,2 ]
Perez-Enciso, M. [1 ,2 ,3 ]
机构
[1] CSIC IRTA UAB UB, CRAG, Bellaterra 08193, Spain
[2] Univ Autonoma Barcelona, Bellaterra 08193, Spain
[3] ICREA, Barcelona 08010, Spain
关键词
bioinformatics pipelines; experimental design; individual resequencing; next-generation sequencing; simulation; POPULATION GENETIC INFERENCE; SEQUENCING DATA; READ ALIGNMENT; ALGORITHMS; SIMULATOR; GENOMICS;
D O I
10.1111/1755-0998.12286
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The choice of technology and bioinformatics approach is critical in obtaining accurate and reliable information from next-generation sequencing (NGS) experiments. An increasing number of software and methodological guidelines are being published, but deciding upon which approach and experimental design to use can depend on the particularities of the species and on the aims of the study. This leaves researchers unable to produce informed decisions on these central questions. To address these issues, we developed pipeliner - a tool to evaluate, by simulation, the performance of NGS pipelines in resequencing studies. Pipeliner provides a graphical interface allowing the users to write and test their own bioinformatics pipelines with publicly available or custom software. It computes a number of statistics summarizing the performance in SNP calling, including the recovery, sensitivity and false discovery rate for heterozygous and homozygous SNP genotypes. Pipeliner can be used to answer many practical questions, for example, for a limited amount of NGS effort, how many more reliable SNPs can be detected by doubling coverage and halving sample size or what is the false discovery rate provided by different SNP calling algorithms and options. Pipeliner thus allows researchers to carefully plan their study's sampling design and compare the suitability of alternative bioinformatics approaches for their specific study systems. Pipeliner is written in C++ and is freely available from http://github.com/brunonevado/Pipeliner.
引用
收藏
页码:99 / 106
页数:8
相关论文
共 24 条
[1]   Population genomics based on low coverage sequencing: how low should we go? [J].
Buerkle, C. Alex ;
Gompert, Zachariah .
MOLECULAR ECOLOGY, 2013, 22 (11) :3028-3035
[2]   Fast and flexible simulation of DNA sequence data [J].
Chen, Gary K. ;
Marjoram, Paul ;
Wall, Jeffrey D. .
GENOME RESEARCH, 2009, 19 (01) :136-142
[3]  
Cheng AY, 2014, BIOINFORMATICS ADV A
[4]  
Crawford Jacob E., 2012, Frontiers in Genetics, V3, P66, DOI 10.3389/fgene.2012.00066
[5]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[6]   Fast Computation and Applications of Genome Mappability [J].
Derrien, Thomas ;
Estelle, Jordi ;
Marco Sola, Santiago ;
Knowles, David G. ;
Raineri, Emanuele ;
Guigo, Roderic ;
Ribeca, Paolo .
PLOS ONE, 2012, 7 (01)
[7]   Neutrality Tests for Sequences with Missing Data [J].
Ferretti, Luca ;
Raineri, Emanuele ;
Ramos-Onsins, Sebastian .
GENETICS, 2012, 191 (04) :1397-U511
[8]   Tools for mapping high-throughput sequencing data [J].
Fonseca, Nuno A. ;
Rung, Johan ;
Brazma, Alvis ;
Marioni, John C. .
BIOINFORMATICS, 2012, 28 (24) :3169-3177
[9]   Assessing the Effect of Sequencing Depth and Sample Size in Population Genetics Inferences [J].
Fumagalli, Matteo .
PLOS ONE, 2013, 8 (11)
[10]   Characterizing Bias in Population Genetic Inferences from Low-Coverage Sequencing Data [J].
Han, Eunjung ;
Sinsheimer, Janet S. ;
Novembre, John .
MOLECULAR BIOLOGY AND EVOLUTION, 2014, 31 (03) :723-735