Mapping short DNA sequencing reads and calling variants using mapping quality scores

被引:1839
作者
Li, Heng [1 ]
Ruan, Jue [2 ]
Durbin, Richard [1 ]
机构
[1] Wellcome Trust Sanger Inst, Hinxton CB10 1SA, England
[2] Chinese Acad Sci, Beijing Genom Inst, Beijing 100029, Peoples R China
基金
英国惠康基金;
关键词
D O I
10.1101/gr.078212.108
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e. g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq. sourceforge. net.
引用
收藏
页码:1851 / 1858
页数:8
相关论文
共 29 条
[1]  
ALTSCHUL SF, 1997, NUCLEIC ACIDS RES, V25, P3402
[2]   An SNP map of the human genome generated by reduced representation shotgun sequencing [J].
Altshuler, D ;
Pollara, VJ ;
Cowles, CR ;
Van Etten, WJ ;
Baldwin, J ;
Linton, L ;
Lander, ES .
NATURE, 2000, 407 (6803) :513-516
[3]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[4]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[5]   Efficient large-scale sequence comparison by locality-sensitive hashing [J].
Buhler, J .
BIOINFORMATICS, 2001, 17 (05) :419-428
[6]   Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing [J].
Campbell, Peter J. ;
Stephens, Philip J. ;
Pleasance, Erin D. ;
O'Meara, Sarah ;
Li, Heng ;
Santarius, Thomas ;
Stebbings, Lucy A. ;
Leroy, Catherine ;
Edkins, Sarah ;
Hardy, Claire ;
Teague, Jon W. ;
Menzies, Andrew ;
Goodhead, Ian ;
Turner, Daniel J. ;
Clee, Christopher M. ;
Quail, Michael A. ;
Cox, Antony ;
Brown, Clive ;
Durbin, Richard ;
Hurles, Matthew E. ;
Edwards, Paul A. W. ;
Bignell, Graham R. ;
Stratton, Michael R. ;
Futreal, P. Andrew .
NATURE GENETICS, 2008, 40 (06) :722-729
[7]   A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis [J].
Down, Thomas A. ;
Rakyan, Vardhman K. ;
Turner, Daniel J. ;
Flicek, Paul ;
Li, Heng ;
Kulesha, Eugene ;
Graf, Stefan ;
Johnson, Nathan ;
Herrero, Javier ;
Tomazou, Eleni M. ;
Thorne, Natalie P. ;
Backdahl, Liselotte ;
Herberth, Marlis ;
Howe, Kevin L. ;
Jackson, David K. ;
Miretti, Marcos M. ;
Marioni, John C. ;
Birney, Ewan ;
Hubbard, Tim J. P. ;
Durbin, Richard ;
Tavare, Simon ;
Beck, Stephan .
NATURE BIOTECHNOLOGY, 2008, 26 (07) :779-785
[8]   Base qualities help sequencing software [J].
Durbin, R ;
Dear, S .
GENOME RESEARCH, 1998, 8 (03) :161-162
[9]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[10]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185