Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples

被引:36
作者
Hong, Huixiao [1 ]
Su, Zhenqiang [1 ]
Ge, Weigong [1 ]
Shi, Leming [1 ]
Perkins, Roger [2 ]
Fang, Hong [2 ]
Xu, Joshua [2 ]
Chen, James J. [3 ]
Han, Tao [1 ]
Kaput, Jim [3 ]
Fuscoe, James C. [1 ]
Tong, Weida [1 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Div Syst Toxicol, Jefferson, AR 72079 USA
[2] US FDA, Natl Ctr Toxicol Res, ICF Int Co, Z Tech Corp, Jefferson, AR 72079 USA
[3] US FDA, Natl Ctr Toxicol Res, Div Personalized Nutr & Med, Jefferson, AR 72079 USA
关键词
D O I
10.1186/1471-2105-9-S9-S17
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set. Results: Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batchsize and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls. Conclusion: Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
引用
收藏
页数:13
相关论文
共 40 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism [J].
Arking, Dan E. ;
Cutler, David J. ;
Brune, Camille W. ;
Teslovich, Tanya M. ;
West, Kristen ;
Ikeda, Morna ;
Rea, Alexis ;
Guy, Moltu ;
Lin, Shin ;
Cook, Edwin H., Jr. ;
Chakravarti, Aravinda .
AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 82 (01) :160-164
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   A genome-wide association scan identifies the hepatic cholesterol transporter ABCG8 as a susceptibility factor for human gallstone disease [J].
Buch, Stephan ;
Schafmayer, Clemens ;
Volzke, Henry ;
Becker, Christian ;
Franke, Andre ;
von Eller-Eberstein, Huberta ;
Kluck, Christian ;
Bassmann, Ingelore ;
Brosch, Mario ;
Lammert, Frank ;
Miquel, Juan Francisco ;
Nervi, Flavio ;
Wittig, Michael ;
Rosskopf, Dieter ;
Timm, Birgit ;
Holl, Christine ;
Seeger, Marcus ;
ElSharawy, Abdou ;
Lu, Tim ;
Egberts, Jan ;
Fandrich, Fred ;
Folsch, Ulrich R. ;
Krawczak, Michael ;
Schreiber, Stefan ;
Nurnberg, Peter ;
Tepel, Jurgen ;
Hampe, Jochen .
NATURE GENETICS, 2007, 39 (08) :995-999
[5]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[6]  
Butcher LM, 2008, GENES BRAIN BEHAV, V7, P435, DOI 10.1111/j.1601-183X.2007.00368.x
[7]   A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes [J].
Cargill, Michele ;
Schrodi, Steven J. ;
Chang, Monica ;
Garcia, Veronica E. ;
Brandon, Rhonda ;
Callis, Kristina P. ;
Matsunami, Nori ;
Ardlie, Kristin G. ;
Civello, Daniel ;
Catanese, Joseph J. ;
Leong, Diane U. ;
Panko, Jackie M. ;
McAllister, Linda B. ;
Hansen, Christopher B. ;
Papenfuss, Jason ;
Prescott, Stephen M. ;
White, Thomas J. ;
Leppert, Mark F. ;
Krueger, Gerald G. ;
Begovich, Ann B. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 80 (02) :273-290
[8]   Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data [J].
Carvalho, Benilton ;
Bengtsson, Henrik ;
Speed, Terence P. ;
Irizarry, Rafael A. .
BIOSTATISTICS, 2007, 8 (02) :485-499
[9]   Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays [J].
Di, XJ ;
Matsuzaki, H ;
Webster, TA ;
Hubbell, E ;
Liu, GY ;
Dong, SL ;
Bartell, D ;
Huang, J ;
Chiles, R ;
Yang, G ;
Shen, MM ;
Kulp, D ;
Kennedy, GC ;
Mei, R ;
Jones, KW ;
Cawley, S .
BIOINFORMATICS, 2005, 21 (09) :1958-1963
[10]   A genome-wide association study identifies IL23R as an inflammatory bowel disease gene [J].
Duerr, Richard H. ;
Taylor, Kent D. ;
Brant, Steven R. ;
Rioux, John D. ;
Silverberg, Mark S. ;
Daly, Mark J. ;
Steinhart, A. Hillary ;
Abraham, Clara ;
Regueiro, Miguel ;
Griffiths, Anne ;
Dassopoulos, Themistocles ;
Bitton, Alain ;
Yang, Huiying ;
Targan, Stephan ;
Datta, Lisa Wu ;
Kistner, Emily O. ;
Schumm, L. Philip ;
Lee, Annette T. ;
Gregersen, Peter K. ;
Barmada, M. Michael ;
Rotter, Jerome I. ;
Nicolae, Dan L. ;
Cho, Judy H. .
SCIENCE, 2006, 314 (5804) :1461-1463