Comparison of insertion/deletion calling algorithms on human next-generation sequencing data

被引:45
作者
Ghoneim D.H. [1 ]
Myers J.R. [2 ]
Tuttle E. [1 ]
Paciorkowski A.R. [1 ,3 ]
机构
[1] Center for Neural Development and Disease, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY
[2] Genomics Research Center, University of Rochester Medical Center, Rochester, NY
[3] Departments of Neurology, Pediatrics, and Biomedical Genetics, University of Rochester Medical Center, Rochester, NY
基金
美国国家卫生研究院;
关键词
Concordance; GATK; Indels; Next generation sequencing; Pindel; Validation;
D O I
10.1186/1756-0500-7-864
中图分类号
学科分类号
摘要
Background: Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller). Results: We observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms. Conclusions: Despite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of larger indels at lower read depths. © 2014 Ghoneim et al.
引用
收藏
相关论文
共 23 条
[1]  
Mullaney J.M., Mills R.E., Pittard W.S., Devine S.E., Small insertions and deletions (indels) in human genomes, Hum Mol Genet, 19, pp. R131-R136, (2010)
[2]  
Mills R.E., Luttig C.T., Larkins C.E., Beauchamp A., Tsui C., Pittard W.S., Devine S.E., An initial map of insertion and deletion (indel) variation in the human genome, Genome Res, 16, pp. 1182-1190, (2006)
[3]  
Bentley D.R., Balasubramanian S., Swerdlow H.P., Smith G.P., Milton J., Brown C.G., Hall K.P., Evers D.J., Barnes C.L., Bignell H.R., Boutell J.M., Bryant J., Carter R.J., Keira Cheetham R., Cox A.J., Ellis D.J., Flatbush M.R., Gormley N.A., Humphray S.J., Irving L.J., Karbelashvili M.S., Kirk S.M., Li H., Liu X., Maisinger K.S., Murray L.J., Obradovic B., Ost T., Parkinson M.L., Pratt M.R., Et al., Accurate whole human genome sequencing using reversible terminator chemistry, Nature, 456, pp. 5
[4]  
Wang J., Wang W., Li R., Li Y., Tian G., Goodman L., Fan W., Zhang J., Li J., Zhang J., Guo Y., Feng B., Li H., Lu Y., Fang X., Liang H., Du Z., Li D., Zhao Y., Hu Y., Yang Z., Zheng H., Hellmann I., Inouye M., Pool J., Yi X., Zhao J., Duan J., Zhou Y., Qin J., Et al., The diploid genome sequence of an asian individual, Nature, 456, pp. 60-65, (2008)
[5]  
Barcena C., Quesada V., De Sandre-Giovannoli A., Puente D.A., Fernandez-Toral J., Sigaudy S., Baban A., Levy N., Velasco G., Lopez-Otin C., Exome sequencing identifies a novel mutation in pik3r1 as the cause of short syndrome, BMC Med Genet, 15, (2014)
[6]  
Meijer H., De Graaff E., Merckx D.M., Jongbloed R.J., De Die-Smulders C.E., Engelen J.J., Fryns J.P., Curfs P.M., Oostra B.A., A deletion of 1.6 kb proximal to the cgg repeat of the fmr1 gene causes the clinical phenotype of the fragile x syndrome, Hum Mol Genet, 3, pp. 615-620, (1994)
[7]  
Schutte D.L., Maas M., Buckwalter K.C., A lrpap1 intronic insertion/deletion polymorphism and phenotypic variability in Alzheimer disease, Res Theory Nurs Pract, 17, pp. 301-319, (2003)
[8]  
Zhang X., Lin H., Zhao H., Hao Y., Mort M., Cooper D.N., Zhou Y., Liu Y., Impact of human pathogenic micro-insertions and micro-deletions on post-transcriptional regulation, Hum Mol Genet, 23, pp. 3024-3034, (2014)
[9]  
Albers C.A., Lunter G., MacArthur D.G., McVean G., Ouwehand W.H., Durbin R., Dindel: Accurate indel calls from short-read data, Genome Res, 21, pp. 961-973, (2011)
[10]  
Fang H., Wu Y., Narzisi G., O'Rawe J.A., Barron L.T.J., Rosenbaum J., Ronemus M., Iossifov I., Schatz M.C., Lyon G.J., Reducing indel calling errors in whole genome and exome sequencing data, Genome Med, 6, (2014)