Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data

被引:31
作者
Flickinger, Matthew [1 ,2 ]
Jun, Goo [1 ,2 ,3 ]
Abecasis, Goncalo R. [1 ,2 ]
Boehnke, Michael [1 ,2 ]
Kang, Hyun Min [1 ,2 ]
机构
[1] Univ Michigan, Dept Biostat, Sch Publ Hlth, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Sch Publ Hlth, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[3] Univ Texas Hlth Sci Ctr Houston, Human Genet Ctr, Sch Publ Hlth, Houston, TX 77030 USA
关键词
D O I
10.1016/j.ajhg.2015.07.002
中图分类号
Q3 [遗传学];
学科分类号
071007 [遗传学];
摘要
DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%-20%), contamination-adjusted calls eliminate 48%-77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%.
引用
收藏
页码:284 / 290
页数:7
相关论文
共 8 条
[1]
An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]
Brent RP., 1973, Algorithms for Minimization Without Derivatives
[3]
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies [J].
Browning, Brian L. ;
Yu, Zhaoxia .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :847-861
[4]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]
Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[6]
Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data [J].
Jun, Goo ;
Flickinger, Matthew ;
Hetrick, Kurt N. ;
Romm, Jane M. ;
Doheny, Kimberly F. ;
Abecasis, Goncalo R. ;
Boehnke, Michael ;
Kang, Hyun Min .
AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 91 (05) :839-848
[7]
Li H, 2009, BIOINFORMATICS, V25, P1094, DOI [10.1093/bioinformatics/btp324, 10.1093/bioinformatics/btp100]
[8]
Low-coverage sequencing: Implications for design of complex trait association studies [J].
Li, Yun ;
Sidore, Carlo ;
Kang, Hyun Min ;
Boehnke, Michael ;
Abecasis, Goncalo R. .
GENOME RESEARCH, 2011, 21 (06) :940-951