IMPROVING POPULATION-SPECIFIC ALLELE FREQUENCY ESTIMATES BY ADAPTING SUPPLEMENTAL DATA: AN EMPIRICAL BAYES APPROACH

被引:10
作者
Coram, Marc [1 ]
Tang, Hua [2 ]
机构
[1] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Genet, Sch Med, Stanford, CA 94305 USA
关键词
Empirical Bayes; allele frequency;
D O I
10.1214/07-AOAS121
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Estimation of the allele frequency at genetic markers is a key ingredients in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other Studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates however, used inappropriately, pooled estimates, can be severely biased due to population stratification. Because of this potential bias. most investigators avoid pooling, even for samples with the same ethnic background and residing oil the same continent. Here, we propose an empirical Bayes approach For estimating, allele frequencies of single nucleotide polymophisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence oil the estimates. In every example we have considered, our estimator achieves it mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation Study that is carefully shown matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to situation a genome-wide association study.
引用
收藏
页码:459 / 479
页数:21
相关论文
共 34 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]  
Balding D., 2001, HDB STAT GENETICS, P179, DOI DOI 10.2307/2419615
[3]  
Bernardo J., 2009, Bayesian theory
[4]   Population stratification confounds genetic association studies among Latinos [J].
Choudhry, S ;
Coyle, NE ;
Tang, H ;
Salari, K ;
Lind, D ;
Clark, SL ;
Tsai, HJ ;
Naqvi, M ;
Phong, A ;
Ung, N ;
Matallana, H ;
Avila, PC ;
Casal, J ;
Torres, A ;
Nazario, S ;
Castro, R ;
Battle, NC ;
Perez-Stable, EJ ;
Kwok, PY ;
Sheppard, D ;
Shriver, MD ;
Rodriguez-Cintron, W ;
Risch, N ;
Ziv, E ;
Burchard, EG .
HUMAN GENETICS, 2006, 118 (05) :652-664
[5]   Ascertainment bias in studies of human genome-wide polymorphism [J].
Clark, AG ;
Hubisz, MJ ;
Bustamante, CD ;
Williamson, SH ;
Nielsen, R .
GENOME RESEARCH, 2005, 15 (11) :1496-1502
[6]   Genomic control for association studies [J].
Devlin, B ;
Roeder, K .
BIOMETRICS, 1999, 55 (04) :997-1004
[7]  
Fisher RA., 1923, P ROY SOC EDINB B, V42, P321, DOI [DOI 10.1017/S0370164600023993, 10.1017/S0370164600023993]
[8]   Whole-genome patterns of common DNA variation in three human populations [J].
Hinds, DA ;
Stuve, LL ;
Nilsen, GB ;
Halperin, E ;
Eskin, E ;
Ballinger, DG ;
Frazer, KA ;
Cox, DR .
SCIENCE, 2005, 307 (5712) :1072-1079
[9]  
HIRSCHHORN JN, 2005, NATURE REV GENETICS, V6, P108
[10]  
JIANG CJ, 1987, GENETICS, V115, P363