Quality Control and Quality Assurance in Genotypic Data for Genome-Wide Association Studies

被引:335
作者
Laurie, Cathy C. [1 ]
Doheny, Kimberly F. [2 ]
Mirel, Daniel B. [3 ]
Pugh, Elizabeth W. [2 ]
Bierut, Laura J. [4 ]
Bhangale, Tushar [1 ]
Boehm, Frederick [1 ]
Caporaso, Neil E. [5 ]
Cornelis, Marilyn C. [6 ]
Edenberg, Howard J. [7 ]
Gabriel, Stacy B. [3 ]
Harris, Emily L. [8 ]
Hu, Frank B. [6 ]
Jacobs, Kevin B. [5 ]
Kraft, Peter [9 ]
Landi, Maria Teresa [5 ]
Lumley, Thomas [1 ]
Manolio, Teri A. [10 ]
McHugh, Caitlin [1 ]
Painter, Ian
Paschall, Justin [11 ]
Rice, John P. [4 ]
Rice, Kenneth M. [1 ]
Zheng, Xiuwen [1 ]
Weir, Bruce S. [1 ]
机构
[1] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[2] Johns Hopkins Univ, Sch Med, Ctr Inherited Dis Res, Baltimore, MD USA
[3] Broad Inst & Harvard, Cambridge, MA USA
[4] Washington Univ, Sch Med, Dept Psychiat, St Louis, MO 63110 USA
[5] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20892 USA
[6] Harvard Univ, Dept Nutr, Harvard Sch Publ Hlth, Boston, MA 02115 USA
[7] Indiana Univ, Sch Med, Dept Biochem & Mol Biol, Indianapolis, IN USA
[8] NIDCR, Div Extramural Res, Bethesda, MD USA
[9] Harvard Univ, Program Mol & Genet Epidemiol, Harvard Sch Publ Hlth, Boston, MA 02115 USA
[10] NHGRI, Off Populat Gen, Bethesda, MD 20892 USA
[11] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
GWAS; DNA sample quality; genotyping artifact; Hardy-Weinberg equilibrium; chromosome aberration; DIFFERENTIAL BIAS; COLLABORATION; SUBSTRUCTURE; GENE;
D O I
10.1002/gepi.20516
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the "Gene Environment Association Studies" (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS. Genet. Epidemiol. 34 :591-602, 2010. (C) 2010 Wiley-Liss, Inc.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 30 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]  
[Anonymous], 2006, R LANG ENV STAT COMP
[3]   Evaluating coverage of genome-wide association studies [J].
Barrett, Jeffrey C. ;
Cardon, Lon R. .
NATURE GENETICS, 2006, 38 (06) :659-662
[4]  
Broman KW, 1999, GENET EPIDEMIOL, V17, pS79
[5]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[6]   Population stratification and spurious allelic association [J].
Cardon, LR ;
Palmer, LJ .
LANCET, 2003, 361 (9357) :598-604
[7]   Replicating genotype-phenotype associations [J].
Chanock, Stephen J. ;
Manolio, Teri ;
Boehnke, Michael ;
Boerwinkle, Eric ;
Hunter, David J. ;
Thomas, Gilles ;
Hirschhorn, Joel N. ;
Abecasis, Goncalo ;
Altshuler, David ;
Bailey-Wilson, Joan E. ;
Brooks, Lisa D. ;
Cardon, Lon R. ;
Daly, Mark ;
Donnelly, Peter ;
Fraumeni, Joseph F., Jr. ;
Freimer, Nelson B. ;
Gerhard, Daniela S. ;
Gunter, Chris ;
Guttmacher, Alan E. ;
Guyer, Mark S. ;
Harris, Emily L. ;
Hoh, Josephine ;
Hoover, Robert ;
Kong, C. Augustine ;
Merikangas, Kathleen R. ;
Morton, Cynthia C. ;
Palmer, Lyle J. ;
Phimister, Elizabeth G. ;
Rice, John P. ;
Roberts, Jerry ;
Rotimi, Charles ;
Tucker, Margaret A. ;
Vogan, Kyle J. ;
Wacholder, Sholom ;
Wijsman, Ellen M. ;
Winn, Deborah M. ;
Collins, Francis S. .
NATURE, 2007, 447 (7145) :655-660
[8]   Population structure, differential bias and genomic control in a large-scale, case-control association study [J].
Clayton, DG ;
Walker, NM ;
Smyth, DJ ;
Pask, R ;
Cooper, JD ;
Maier, LM ;
Smink, LJ ;
Lam, AC ;
Ovington, NR ;
Stevens, HE ;
Nutland, S ;
Howson, JMM ;
Faham, M ;
Moorhead, M ;
Jones, HB ;
Falkowski, M ;
Hardenbol, P ;
Willis, TD ;
Todd, JA .
NATURE GENETICS, 2005, 37 (11) :1243-1246
[9]   Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis [J].
Conlin, Laura K. ;
Thiel, Brian D. ;
Bonnemann, Carsten G. ;
Medne, Livija ;
Ernst, Linda M. ;
Zackai, Elaine H. ;
Deardorff, Matthew A. ;
Krantz, Ian D. ;
Hakonarson, Hakon ;
Spinner, Nancy B. .
HUMAN MOLECULAR GENETICS, 2010, 19 (07) :1263-1275
[10]   The Gene, Environment Association Studies Consortium (GENEVA): Maximizing the Knowledge Obtained from GWAS by Collaboration Across Studies of Multiple Conditions [J].
Cornelis, Marilyn C. ;
Agrawal, Arpana ;
Cole, John W. ;
Hansel, Nadia N. ;
Barnes, Kathleen C. ;
Beaty, Terri H. ;
Bennett, Siiri N. ;
Bierut, Laura J. ;
Boerwinkle, Eric ;
Doheny, Kimberly F. ;
Feenstra, Bjarke ;
Feingold, Eleanor ;
Fornage, Myriam ;
Haiman, Christopher A. ;
Harris, Emily L. ;
Hayes, M. Geoffrey ;
Heit, John A. ;
Hu, Frank B. ;
Kang, Jae H. ;
Laurie, Cathy C. ;
Ling, Hua ;
Manolio, Teri A. ;
Marazita, Mary L. ;
Mathias, Rasika A. ;
Mirel, Daniel B. ;
Paschall, Justin ;
Pasquale, Louis R. ;
Pugh, Elizabeth W. ;
Rice, John P. ;
Udren, Jenna ;
van Dam, Rob M. ;
Wang, Xiaojing ;
Wiggs, Janey L. ;
Williams, Kayleen ;
Yu, Kai .
GENETIC EPIDEMIOLOGY, 2010, 34 (04) :364-372