Microindel detection in short-read sequence data

被引:73
作者
Krawitz, Peter [1 ,2 ,3 ]
Roedelsperger, Christian [1 ,2 ,3 ]
Jaeger, Marten [1 ,3 ]
Jostins, Luke [4 ]
Bauer, Sebastian [1 ,3 ]
Robinson, Peter N. [1 ,2 ,3 ]
机构
[1] Charite, Inst Med Genet, D-13353 Berlin, Germany
[2] Berlin Brandenburg Ctr Regenerat Therapies, D-13353 Berlin, Germany
[3] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
[4] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
关键词
GENOME; EFFICIENT;
D O I
10.1093/bioinformatics/btq027
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge. Results: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels.
引用
收藏
页码:722 / 729
页数:8
相关论文
共 20 条
  • [1] The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group
    Ahn, Sung-Min
    Kim, Tae-Hyung
    Lee, Sunghoon
    Kim, Deokhoon
    Ghang, Ho
    Kim, Dae-Soo
    Kim, Byoung-Chul
    Kim, Sang-Yoon
    Kim, Woo-Yeon
    Kim, Chulhong
    Park, Daeui
    Lee, Yong Seok
    Kim, Sangsoo
    Reja, Rohit
    Jho, Sungwoong
    Kim, Chang Geun
    Cha, Ji-Young
    Kim, Kyung-Hee
    Lee, Bonghee
    Bhak, Jong
    Kim, Seong-Jin
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1622 - 1629
  • [2] Microdeletions and microinsertions causing human genetic disease: Common mechanisms of mutagenesis and the role of local DNA sequence complexity
    Ball, EV
    Stenson, PD
    Abeysinghe, SS
    Krawczak, M
    Cooper, DN
    Chuzhanova, NA
    [J]. HUMAN MUTATION, 2005, 26 (03) : 205 - 213
  • [3] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [4] Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes
    Bhangale, TR
    Rieder, MJ
    Livingston, RJ
    Nickerson, DA
    [J]. HUMAN MOLECULAR GENETICS, 2005, 14 (01) : 59 - 69
  • [5] Durbin R., 1999, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
  • [6] Evaluation of next generation sequencing platforms for population targeted sequencing studies
    Harismendy, Olivier
    Ng, Pauline C.
    Strausberg, Robert L.
    Wang, Xiaoyun
    Stockwell, Timothy B.
    Beeson, Karen Y.
    Schork, Nicholas J.
    Murray, Sarah S.
    Topol, Eric J.
    Levy, Samuel
    Frazer, Kelly A.
    [J]. GENOME BIOLOGY, 2009, 10 (03):
  • [7] mreps: efficient and flexible detection of tandem repeats in DNA
    Kolpakov, R
    Bana, G
    Kucherov, G
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3672 - 3678
  • [8] Korbel JO, 2007, SCIENCE, V318, P420, DOI 10.1126/science.1149504
  • [9] The mutation process of microsatellites during the polymerase chain reaction
    Lai, YL
    Shinde, D
    Arnheim, N
    Sun, FZ
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (02) : 143 - 155
  • [10] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    Langmead, Ben
    Trapnell, Cole
    Pop, Mihai
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2009, 10 (03):