Effective filtering strategies to improve data quality from population-based whole exome sequencing studies

被引:97
作者
Carson, Andrew R. [1 ,2 ]
Smith, Erin N. [1 ,2 ]
Matsui, Hiroko [1 ,2 ]
Braekkan, Sigrid K. [3 ,4 ]
Jepsen, Kristen [1 ,2 ]
Hansen, John-Bjarne [3 ,4 ]
Frazer, Kelly A. [1 ,2 ,5 ,6 ,7 ]
机构
[1] Univ Calif San Diego, Dept Pediat, San Diego, CA 92103 USA
[2] Univ Calif San Diego, Rady Childrens Hosp, San Diego, CA 92103 USA
[3] Univ Tromso, Dept Clin Med, Hematol Res Grp, Tromso, Norway
[4] Univ Hosp North Norway, Div Internal Med, Tromso, Norway
[5] Univ Calif San Diego, Clin & Translat Res Inst, San Diego, CA 92103 USA
[6] Univ Tromso, Dept Clin Med, Tromso, Norway
[7] Univ Calif San Diego, Moores UCSD Canc Ctr, La Jolla, CA 92093 USA
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
美国国家卫生研究院;
关键词
Next generation sequencing; Single nucleotide variants; Genotyping; Imputation; Genomics; SNP GENOTYPING ERRORS; MENDELIAN DISEASE; GENETIC ASSOCIATION; RARE VARIANTS; GENOME; MUTATIONS; IMPUTATION; DISCOVERY; CARCINOMA; FRAMEWORK;
D O I
10.1186/1471-2105-15-125
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Genotypes generated in next generation sequencing studies contain errors which can significantly impact the power to detect signals in common and rare variant association tests. These genotyping errors are not explicitly filtered by the standard GATK Variant Quality Score Recalibration (VQSR) tool and thus remain a source of errors in whole exome sequencing (WES) projects that follow GATK's recommended best practices. Therefore, additional data filtering methods are required to effectively remove these errors before performing association analyses with complex phenotypes. Here we empirically derive thresholds for genotype and variant filters that, when used in conjunction with the VQSR tool, achieve higher data quality than when using VQSR alone. Results: The detailed filtering strategies improve the concordance of sequenced genotypes with array genotypes from 99.33% to 99.77%; improve the percent of discordant genotypes removed from 10.5% to 69.5%; and improve the Ti/Tv ratio from 2.63 to 2.75. We also demonstrate that managing batch effects by separating samples based on different target capture and sequencing chemistry protocols results in a final data set containing 40.9% more high-quality variants. In addition, imputation is an important component of WES studies and is used to estimate common variant genotypes to generate additional markers for association analyses. As such, we demonstrate filtering methods for imputed data that improve genotype concordance from 79.3% to 99.8% while removing 99.5% of discordant genotypes. Conclusions: The described filtering methods are advantageous for large population-based WES studies designed to identify common and rare variation associated with complex diseases. Compared to data processed through standard practices, these strategies result in substantially higher quality data for common and rare association analyses.
引用
收藏
页数:15
相关论文
共 44 条
[1]   Exome Sequencing of Head and Neck Squamous Cell Carcinoma Reveals Inactivating Mutations in NOTCH1 [J].
Agrawal, Nishant ;
Frederick, Mitchell J. ;
Pickering, Curtis R. ;
Bettegowda, Chetan ;
Chang, Kyle ;
Li, Ryan J. ;
Fakhry, Carole ;
Xie, Tong-Xin ;
Zhang, Jiexin ;
Wang, Jing ;
Zhang, Nianxiang ;
El-Naggar, Adel K. ;
Jasser, Samar A. ;
Weinstein, John N. ;
Trevino, Lisa ;
Drummond, Jennifer A. ;
Muzny, Donna M. ;
Wu, Yuanqing ;
Wood, Laura D. ;
Hruban, Ralph H. ;
Westra, William H. ;
Koch, Wayne M. ;
Califano, Joseph A. ;
Gibbs, Richard A. ;
Sidransky, David ;
Vogelstein, Bert ;
Velculescu, Victor E. ;
Papadopoulos, Nickolas ;
Wheeler, David A. ;
Kinzler, Kenneth W. ;
Myers, Jeffrey N. .
SCIENCE, 2011, 333 (6046) :1154-1157
[2]   Imputation of Exome Sequence Variants into Population-Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project [J].
Auer, Paul L. ;
Johnsen, Jill M. ;
Johnson, Andrew D. ;
Logsdon, Benjamin A. ;
Lange, Leslie A. ;
Nalls, Michael A. ;
Zhang, Guosheng ;
Franceschini, Nora ;
Fox, Keolu ;
Lange, Ethan M. ;
Rich, Stephen S. ;
O'Donnell, Christopher J. ;
Jackson, Rebecca D. ;
Wallace, Robert B. ;
Chen, Zhao ;
Graubert, Timothy A. ;
Wilson, James G. ;
Tang, Hua ;
Lettre, Guillaume ;
Reiner, Alex P. ;
Ganesh, Santhi K. ;
Li, Yun .
AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 91 (05) :794-808
[3]   Exome sequencing as a tool for Mendelian disease gene discovery [J].
Bamshad, Michael J. ;
Ng, Sarah B. ;
Bigham, Abigail W. ;
Tabor, Holly K. ;
Emond, Mary J. ;
Nickerson, Deborah A. ;
Shendure, Jay .
NATURE REVIEWS GENETICS, 2011, 12 (11) :745-755
[4]   Integrated genomic analyses of ovarian carcinoma [J].
Bell, D. ;
Berchuck, A. ;
Birrer, M. ;
Chien, J. ;
Cramer, D. W. ;
Dao, F. ;
Dhir, R. ;
DiSaia, P. ;
Gabra, H. ;
Glenn, P. ;
Godwin, A. K. ;
Gross, J. ;
Hartmann, L. ;
Huang, M. ;
Huntsman, D. G. ;
Iacocca, M. ;
Imielinski, M. ;
Kalloger, S. ;
Karlan, B. Y. ;
Levine, D. A. ;
Mills, G. B. ;
Morrison, C. ;
Mutch, D. ;
Olvera, N. ;
Orsulic, S. ;
Park, K. ;
Petrelli, N. ;
Rabeno, B. ;
Rader, J. S. ;
Sikic, B. I. ;
Smith-McCune, K. ;
Sood, A. K. ;
Bowtell, D. ;
Penny, R. ;
Testa, J. R. ;
Chang, K. ;
Dinh, H. H. ;
Drummond, J. A. ;
Fowler, G. ;
Gunaratne, P. ;
Hawes, A. C. ;
Kovar, C. L. ;
Lewis, L. R. ;
Morgan, M. B. ;
Newsham, I. F. ;
Santibanez, J. ;
Reid, J. G. ;
Trevino, L. R. ;
Wu, Y. -Q. ;
Wang, M. .
NATURE, 2011, 474 (7353) :609-615
[5]   Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations [J].
Bilguvar, Kaya ;
Ozturk, Ali Kemal ;
Louvi, Angeliki ;
Kwan, Kenneth Y. ;
Choi, Murim ;
Tatli, Burak ;
Yalnizoglu, Dilek ;
Tuysuz, Beyhan ;
Caglayan, Ahmet Okay ;
Gokben, Sarenur ;
Kaymakcalan, Hande ;
Barak, Tanyeri ;
Bakircioglu, Mehmet ;
Yasuno, Katsuhito ;
Ho, Winson ;
Sanders, Stephan ;
Zhu, Ying ;
Yilmaz, Sanem ;
Dincer, Alp ;
Johnson, Michele H. ;
Bronen, Richard A. ;
Kocer, Naci ;
Per, Hueseyin ;
Mane, Shrikant ;
Pamir, Mehmet Necmettin ;
Yalcinkaya, Cengiz ;
Kumandas, Sefer ;
Topcu, Meral ;
Ozmen, Meral ;
Sestan, Nenad ;
Lifton, Richard P. ;
State, Matthew W. ;
Gunel, Murat .
NATURE, 2010, 467 (7312) :207-U93
[6]   A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals [J].
Browning, Brian L. ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) :210-223
[7]  
Danecek P., Bioinformatics
[8]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[9]   Identification of a Novel Mutation in the CDHR1 Gene in a Family With Recessive Retinal Degeneration [J].
Duncan, Jacque L. ;
Roorda, Austin ;
Navani, Mili ;
Vishweswaraiah, Sangeetha ;
Syed, Reema ;
Soudry, Shiri ;
Ratnam, Kavitha ;
Gudiseva, Harini V. ;
Lee, Pauline ;
Gaasterland, Terry ;
Ayyagari, Radha .
ARCHIVES OF OPHTHALMOLOGY, 2012, 130 (10) :1301-1308
[10]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74