A probabilistic method for the detection and genotyping of small indels from population-scale sequence data

被引:17
作者
Bansal, Vikas [1 ]
Libiger, Ondrej [1 ,2 ]
机构
[1] Scripps Translat Sci Inst, La Jolla, CA 92037 USA
[2] Scripps Res Inst, Dept Mol & Expt Med, La Jolla, CA 92037 USA
基金
美国国家卫生研究院;
关键词
SHORT-READ; GENOME; ALIGNMENT; INSERTION;
D O I
10.1093/bioinformatics/btr344
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: High-throughput sequencing technologies have made population-scale studies of human genetic variation possible. Accurate and comprehensive detection of DNA sequence variants is crucial for the success of these studies. Small insertions and deletions represent the second most frequent class of variation in the human genome after single nucleotide polymorphisms (SNPs). Although several alignment tools for the gapped alignment of sequence reads to a reference genome are available, computational methods for discriminating indels from sequencing errors and genotyping indels directly from sequence reads are needed. Results: We describe a probabilistic method for the accurate detection and genotyping of short indels from population-scale sequence data. In this approach, aligned sequence reads from a population of individuals are used to automatically account for context-specific sequencing errors associated with indels. We applied this approach to population sequence datasets from the 1000 Genomes exon pilot project generated using the Roche 454 and Illumina sequencing platforms, and were able to detect a significantly greater number of indels than reported previously. Comparison to indels identified in the 1000 Genomes pilot project demonstrated the sensitivity of our method. The consistency in the number of indels and the fraction of indels whose length is a multiple of three across different human populations and two different sequencing platforms indicated that our method has a low false discovery rate. Finally, the method represents a general approach for the detection and genotyping of small-scale DNA sequence variants for population-scale sequencing projects.
引用
收藏
页码:2047 / 2053
页数:7
相关论文
共 28 条
[1]   Dindel: Accurate indel calls from short-read data [J].
Albers, Cornelis A. ;
Lunter, Gerton ;
MacArthur, Daniel G. ;
McVean, Gilean ;
Ouwehand, Willem H. ;
Durbin, Richard .
GENOME RESEARCH, 2011, 21 (06) :961-973
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   Accurate detection and genotyping of SNPs utilizing population sequencing data [J].
Bansal, Vikas ;
Harismendy, Olivier ;
Tewhey, Ryan ;
Murray, Sarah S. ;
Schork, Nicholas J. ;
Topol, Eric J. ;
Frazer, Kelly A. .
GENOME RESEARCH, 2010, 20 (04) :537-545
[4]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[5]   Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes [J].
Bhangale, TR ;
Rieder, MJ ;
Livingston, RJ ;
Nickerson, DA .
HUMAN MOLECULAR GENETICS, 2005, 14 (01) :59-69
[6]   STATISTICAL PROPERTIES OF SEGREGATING SITES [J].
FU, YX .
THEORETICAL POPULATION BIOLOGY, 1995, 48 (02) :172-197
[7]   Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA [J].
Homer, Nils ;
Nelson, Stanley F. .
GENOME BIOLOGY, 2010, 11 (10)
[8]   BFAST: An Alignment Tool for Large Scale Genome Resequencing [J].
Homer, Nils ;
Merriman, Barry ;
Nelson, Stanley F. .
PLOS ONE, 2009, 4 (11) :A95-A106
[9]   Microindel detection in short-read sequence data [J].
Krawitz, Peter ;
Roedelsperger, Christian ;
Jaeger, Marten ;
Jostins, Luke ;
Bauer, Sebastian ;
Robinson, Peter N. .
BIOINFORMATICS, 2010, 26 (06) :722-729
[10]   SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples [J].
Le, Si Quang ;
Durbin, Richard .
GENOME RESEARCH, 2011, 21 (06) :952-960