Quality scores and SNP detection in sequencing-by-synthesis systems

被引:196
作者
Brockman, William [1 ,2 ]
Alvarez, Pablo [1 ,2 ]
Young, Sarah [1 ,2 ]
Garber, Manuel [1 ,2 ]
Giannoukos, Georgia [1 ,2 ]
Lee, William L. [1 ,2 ]
Russ, Carsten [1 ,2 ]
Lander, Eric S. [1 ,2 ,3 ]
Nusbaum, Chad [1 ,2 ]
Jaffe, David B. [1 ,2 ]
机构
[1] MIT, Broad Inst, Cambridge, MA 02141 USA
[2] Harvard Univ, Broad Inst, Cambridge, MA 02141 USA
[3] MIT, Whitehead Inst Biomed Res, Cambridge, MA 02139 USA
关键词
D O I
10.1101/gr.070227.107
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Promising new sequencing technologies, based on sequencing-by-synthesis (SBS), are starting to deliver large amounts of DNA sequence at very low cost. Polymorphism detection is a key application. We describe general methods for improved quality scores and accurate automated polymorphism detection, and apply them to data from the Roche (454) Genome Sequencer 20. We assess our methods using known-truth data sets, which is critical to the validity of the assessments. We developed informative, base-by-base error predictors for this sequencer and used a variant of the phred binning algorithm to combine them into a single empirically derived quality score. These quality scores are more useful than those produced by the system software: They both better predict actual error rates and identify many more high-quality bases. We developed a SNP detection method, with variants for low coverage, high coverage, and PCR amplicon applications, and evaluated it on known-truth data sets. We demonstrate good specificity in single reads, and excellent specificity (no false positives in 215 kb of genome) in high-coverage data.
引用
收藏
页码:763 / 770
页数:8
相关论文
共 10 条
[1]   An SNP map of the human genome generated by reduced representation shotgun sequencing [J].
Altshuler, D ;
Pollara, VJ ;
Cowles, CR ;
Van Etten, WJ ;
Baldwin, J ;
Linton, L ;
Lander, ES .
NATURE, 2000, 407 (6803) :513-516
[2]  
*APPL BIOS INC, 2004, US B APPL BIOS INC
[3]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[4]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[5]   The International HapMap Project [J].
Gibbs, RA ;
Belmont, JW ;
Hardenbol, P ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Ch'ang, LY ;
Huang, W ;
Liu, B ;
Shen, Y ;
Tam, PKH ;
Tsui, LC ;
Waye, MMY ;
Wong, JTF ;
Zeng, CQ ;
Zhang, QR ;
Chee, MS ;
Galver, LM ;
Kruglyak, S ;
Murray, SS ;
Oliphant, AR ;
Montpetit, A ;
Hudson, TJ ;
Chagnon, F ;
Ferretti, V ;
Leboeuf, M ;
Phillips, MS ;
Verner, A ;
Kwok, PY ;
Duan, SH ;
Lind, DL ;
Miller, RD ;
Rice, JP ;
Saccone, NL ;
Taillon-Miller, P ;
Xiao, M ;
Nakamura, Y ;
Sekine, A ;
Sorimachi, K ;
Tanaka, T ;
Tanaka, Y ;
Tsunoda, T ;
Yoshino, E ;
Bentley, DR ;
Deloukas, P ;
Hunt, S ;
Powell, D ;
Altshuler, D ;
Gabriel, SB ;
Qiu, RZ .
NATURE, 2003, 426 (6968) :789-796
[6]   Whole-genome sequence assembly for mammalian genomes: Arachne 2 [J].
Jaffe, DB ;
Butler, J ;
Gnerre, S ;
Mauceli, E ;
Lindblad-Toh, K ;
Mesirov, JP ;
Zody, MC ;
Lander, ES .
GENOME RESEARCH, 2003, 13 (01) :91-96
[7]   The UCSC Genome Browser Database: Update 2007 [J].
Kuhn, R. M. ;
Karolchik, D. ;
Zweig, A. S. ;
Trumbower, H. ;
Thomas, D. J. ;
Thakkapallayil, A. ;
Sugnet, C. W. ;
Stanke, M. ;
Smith, K. E. ;
Siepel, A. ;
Rosenbloom, K. R. ;
Rhead, B. ;
Raney, B. J. ;
Pohl, A. ;
Pedersen, J. S. ;
Hsu, F. ;
Hinrichs, A. S. ;
Harte, R. A. ;
Diekhans, M. ;
Clawson, H. ;
Bejerano, G. ;
Barber, G. P. ;
Baertsch, R. ;
Haussler, D. ;
Kent, W. J. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D668-D673
[8]   Genome sequencing in microfabricated high-density picolitre reactors [J].
Margulies, M ;
Egholm, M ;
Altman, WE ;
Attiya, S ;
Bader, JS ;
Bemben, LA ;
Berka, J ;
Braverman, MS ;
Chen, YJ ;
Chen, ZT ;
Dewell, SB ;
Du, L ;
Fierro, JM ;
Gomes, XV ;
Godwin, BC ;
He, W ;
Helgesen, S ;
Ho, CH ;
Irzyk, GP ;
Jando, SC ;
Alenquer, MLI ;
Jarvie, TP ;
Jirage, KB ;
Kim, JB ;
Knight, JR ;
Lanza, JR ;
Leamon, JH ;
Lefkowitz, SM ;
Lei, M ;
Li, J ;
Lohman, KL ;
Lu, H ;
Makhijani, VB ;
McDade, KE ;
McKenna, MP ;
Myers, EW ;
Nickerson, E ;
Nobile, JR ;
Plant, R ;
Puc, BP ;
Ronan, MT ;
Roth, GT ;
Sarkis, GJ ;
Simons, JF ;
Simpson, JW ;
Srinivasan, M ;
Tartaro, KR ;
Tomasz, A ;
Vogt, KA ;
Volkmer, GA .
NATURE, 2005, 437 (7057) :376-380
[9]   A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms [J].
Sachidanandam, R ;
Weissman, D ;
Schmidt, SC ;
Kakol, JM ;
Stein, LD ;
Marth, G ;
Sherry, S ;
Mullikin, JC ;
Mortimore, BJ ;
Willey, DL ;
Hunt, SE ;
Cole, CG ;
Coggill, PC ;
Rice, CM ;
Ning, ZM ;
Rogers, J ;
Bentley, DR ;
Kwok, PY ;
Mardis, ER ;
Yeh, RT ;
Schultz, B ;
Cook, L ;
Davenport, R ;
Dante, M ;
Fulton, L ;
Hillier, L ;
Waterston, RH ;
McPherson, JD ;
Gilman, B ;
Schaffner, S ;
Van Etten, WJ ;
Reich, D ;
Higgins, J ;
Daly, MJ ;
Blumenstiel, B ;
Baldwin, J ;
Stange-Thomann, NS ;
Zody, MC ;
Linton, L ;
Lander, ES ;
Altshuler, D .
NATURE, 2001, 409 (6822) :928-933
[10]   Basecalling with LifeTrace [J].
Walther, D ;
Bartha, G ;
Morris, M .
GENOME RESEARCH, 2001, 11 (05) :875-888