Power in the Phenotypic Extremes: A Simulation Study of Power in Discovery and Replication of Rare Variants

被引:77
作者
Guey, Lin T. [3 ]
Kravic, Jasmina [4 ,5 ]
Melander, Olle [6 ]
Burtt, Noel P. [2 ]
Laramie, Jason M. [3 ]
Lyssenko, Valeriya [4 ,5 ]
Jonsson, Anna [4 ,5 ]
Lindholm, Eero [4 ,5 ]
Tuomi, Tiinamaija [7 ,8 ]
Isomaa, Bo [8 ,9 ]
Nilsson, Peter [10 ]
Almgren, Peter [4 ,5 ]
Kathiresan, Sekar [2 ,11 ,12 ,13 ]
Groop, Leif [4 ,5 ]
Seymour, Albert B. [3 ]
Altshuler, David [2 ,11 ,13 ,14 ,15 ]
Voight, Benjamin F. [1 ,2 ,11 ,13 ]
机构
[1] Broad Inst Harvard, Cambridge Ctr 7, Cambridge, MA 02144 USA
[2] MIT, Cambridge, MA 02139 USA
[3] Pfizer Biotherapeut, Appl Quantitat Genotherapeut, Cambridge, MA USA
[4] Lund Univ, Dept Clin Sci Diabet & Endocrinol, Malmo, Sweden
[5] Lund Univ, Ctr Diabet, Malmo, Sweden
[6] Lund Univ, Lund Univ Diabet Ctr, Clin Res Ctr, Malmo Univ Hosp, S-22100 Lund, Sweden
[7] Univ Helsinki, Dept Med, Helsinki Univ Hosp, Helsinki, Finland
[8] Folkhalsan Res Ctr, Helsinki, Finland
[9] Malmska Municipal Hlth Ctr & Hosp, Pietarsaari, Finland
[10] Lund Univ, Dept Clin Sci, Malmo, Sweden
[11] Massachusetts Gen Hosp, Ctr Human Genet Res, Boston, MA 02114 USA
[12] Massachusetts Gen Hosp, Cardiovasc Res Ctr, Boston, MA 02114 USA
[13] Harvard Univ, Sch Med, Dept Med, Boston, MA USA
[14] Harvard Univ, Sch Med, Dept Genet, Boston, MA USA
[15] Massachusetts Gen Hosp, Diabet Unit, Boston, MA 02114 USA
基金
瑞典研究理事会;
关键词
liability ascertainment; next-generation sequencing; variant discovery; replication of association; phenotype extremes; GENOME-WIDE ASSOCIATION; QUANTITATIVE TRAIT LOCI; CORONARY-HEART-DISEASE; DISCORDANT SIB PAIRS; COMMON DISEASES; PLASMA-LEVELS; RISK-FACTORS; PREDICTION; CONTRIBUTE; ALLELES;
D O I
10.1002/gepi.20572
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation sequencing technologies are making it possible to study the role of rare variants in human disease. Many studies balance statistical power with cost-effectiveness by (a) sampling from phenotypic extremes and (b) utilizing a two-stage design. Two-stage designs include a broad-based discovery phase and selection of a subset of potential causal genes/variants to be further examined in independent samples. We evaluate three parameters: first, the gain in statistical power due to extreme sampling to discover causal variants; second, the informativeness of initial (Phase I) association statistics to select genes/variants for follow-up; third, the impact of extreme and random sampling in (Phase 2) replication. We present a quantitative method to select individuals from the phenotypic extremes of a binary trait, and simulate disease association studies under a variety of sample sizes and sampling schemes. First, we find that while studies sampling from extremes have excellent power to discover rare variants, they have limited power to associate them to phenotype-suggesting high false-negative rates for upcoming studies. Second, consistent with previous studies, we find that the effect sizes estimated in these studies are expected to be systematically larger compared with the overall population effect size; in a well-cited lipids study, we estimate the reported effect to be twofold larger. Third, replication studies require large samples from the general population to have sufficient power; extreme sampling could reduce the required sample size as much as fourfold. Our observations offer practical guidance for the design and interpretation of studies that utilize extreme sampling. Genet. Epidemiol. 35: 236-246, 2011. (c) 2011 Wiley-Liss, Inc.
引用
收藏
页码:236 / 246
页数:11
相关论文
共 47 条
[1]  
Agresti A, 2013, Categorical data analysis, V3rd
[2]   Medical sequencing at the extremes of human body mass [J].
Ahituv, Nadav ;
Kavaslar, Nihan ;
Schackwitz, Wendy ;
Ustaszewska, Anna ;
Martin, Joel ;
Hebert, Sybil ;
Doelle, Heather ;
Ersoy, Baran ;
Kryukov, Gregory ;
Schmidt, Steffen ;
Yosef, Nir ;
Ruppin, Eytan ;
Sharan, Roded ;
Vaisse, Christian ;
Sunyaev, Shamil ;
Dent, Robert ;
Cohen, Jonathan ;
McPherson, Ruth ;
Pennacchio, Len A. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 80 (04) :779-791
[3]   Common variants in the TCF7L2 gene help to differentiate autoimmune from non-autoimmune diabetes in young (15-34 years) but not in middle-aged (40-59 years) diabetic patients [J].
Bakhtadze, E. ;
Cervin, C. ;
Lindholm, E. ;
Borg, H. ;
Nilsson, P. ;
Arnqvist, H. J. ;
Bolinder, J. ;
Eriksson, J. W. ;
Gudbjornsdottir, S. ;
Nystrom, L. ;
Agardh, C. -D. ;
Landin-Olsson, M. ;
Sundkvist, G. ;
Groop, L. C. .
DIABETOLOGIA, 2008, 51 (12) :2224-2232
[4]   No contribution of angiotensin-converting enzyme (ACE) gene variants to severe obesity: a model for comprehensive case/control and quantitative cladistic analysis of ACE in human diseases [J].
Bell, Christopher G. ;
Meyre, David ;
Petretto, Enrico ;
Levy-Marchal, Claire ;
Hercberg, Serge ;
Charles, Marie Aline ;
Boyle, Cliona ;
Weill, Jacques ;
Tauber, Maite ;
Mein, Charles A. ;
Aitman, Timothy J. ;
Froguel, Philippe ;
Walley, Andrew J. .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2007, 15 (03) :320-327
[5]   Common and rare variants in multifactorial susceptibility to common diseases [J].
Bodmer, Walter ;
Bonilla, Carolina .
NATURE GENETICS, 2008, 40 (06) :695-701
[6]   Software for Generating Liability Distributions for Pedigrees Conditional on Their Observed Disease States and Covariates [J].
Campbell, Desmond D. ;
Sham, Pak C. ;
Knight, Jo ;
Wickham, Harvey ;
Landau, Sabine .
GENETIC EPIDEMIOLOGY, 2010, 34 (02) :159-170
[7]   Estimating penetrance from family data using a retrospective likelihood when ascertainment depends on genotype and age of onset [J].
Carayol, J ;
Bonaïti-Pellié, C .
GENETIC EPIDEMIOLOGY, 2004, 27 (02) :109-117
[8]   Genetic similarities between latent autoimmune diabetes in adults, type 1 diabetes, and type 2 diabetes [J].
Cervin, Camilla ;
Lyssenko, Valeriya ;
Bakhtadze, Ekaterine ;
Lindholm, Eero ;
Nilsson, Peter ;
Tuomi, Tiinamaija ;
Cilio, Corrado M. ;
Groop, Leif .
DIABETES, 2008, 57 (05) :1433-1437
[9]   Conditional likelihood inference under complex ascertainment using data augmentation [J].
Clayton, D .
BIOMETRIKA, 2003, 90 (04) :976-981
[10]   Sequence variations in PCSK9, low LDL, and protection against coronary heart disease [J].
Cohen, JC ;
Boerwinkle, E ;
Mosley, TH ;
Hobbs, HH .
NEW ENGLAND JOURNAL OF MEDICINE, 2006, 354 (12) :1264-1272