Size matters: just how big is BIG? Quantifying realistic sample size requirements for human genome epidemiology

被引:168
作者
Burton, Paul R. [1 ,2 ,3 ]
Hansell, Anna L. [4 ]
Fortier, Isabel [3 ,5 ]
Manolio, Teri A. [6 ]
Khoury, Muin J. [3 ,7 ]
Little, Julian [3 ,8 ]
Elliott, Paul [4 ]
机构
[1] Univ Leicester, Dept Hlth Sci, Leicester LE1 7RH, Leics, England
[2] Univ Leicester, Dept Genet, Leicester LE1 7RH, Leics, England
[3] Univ Montreal, P3G, Montreal, PQ H3C 3J7, Canada
[4] Univ London Imperial Coll Sci Technol & Med, Dept Epidemiol & Publ Hlth, London, England
[5] Univ Montreal, Dept Med Sociale & Prevent, Montreal, PQ, Canada
[6] NHGRI, NIH, Bethesda, MD 20892 USA
[7] Ctr Dis Control & Prevent, Natl Off Publ Hlth Genom, Atlanta, GA USA
[8] Univ Ottawa, Dept Epidemiol & Community Med, Ottawa, ON, Canada
基金
英国惠康基金; 英国医学研究理事会;
关键词
Human genome epidemiology; biobank; sample size; statistical power; simulation studies; measurement error; reliability; aetiological heterogeneity; WIDE ASSOCIATION SCAN; FACTOR-H POLYMORPHISM; GENETIC EPIDEMIOLOGY; MENDELIAN RANDOMIZATION; COLORECTAL-CANCER; COMMON VARIANTS; COMPLEX DISEASE; TAG SNPS; SUSCEPTIBILITY; RISK;
D O I
10.1093/ije/dyn147
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background Despite earlier doubts, a string of recent successes indicates that if sample sizes are large enough, it is possible-both in theory and in practice-to identify and replicate genetic associations with common complex diseases. But human genome epidemiology is expensive and, from a strategic perspective, it is still unclear what 'large enough' really means. This question has critical implications for governments, funding agencies, bioscientists and the tax-paying public. Difficult strategic decisions with imposing price tags and important opportunity costs must be taken. Methods Conventional power calculations for case-control studies disregard many basic elements of analytic complexity-e. g. errors in clinical assessment, and the impact of unmeasured aetiological determinants-and can seriously underestimate true sample size requirements. This article describes, and applies, a rigorous simulation-based approach to power calculation that deals more comprehensively with analytic complexity and has been implemented on the web as ESPRESSO: (www.p3gobservatory.org/powercalculator.htm). Results Using this approach, the article explores the realistic power profile of stand-alone and nested case-control studies in a variety of settings and provides a robust quantitative foundation for determining the required sample size both of individual biobanks and of large disease-based consortia. Despite universal acknowledgment of the importance of large sample sizes, our results suggest that contemporary initiatives are still, at best, at the lower end of the range of desirable sample size. Insufficient power remains particularly problematic for studies exploring gene-gene or gene-environment interactions. Discussion Sample size calculation must be both accurate and realistic, and we must continue to strengthen national and international cooperation in the design, conduct, harmonization and integration of studies in human genome epidemiology.
引用
收藏
页码:263 / 273
页数:11
相关论文
共 81 条
  • [11] Association study designs for complex diseases
    Cardon, LR
    Bell, JI
    [J]. NATURE REVIEWS GENETICS, 2001, 2 (02) : 91 - 99
  • [12] Cargill M, 2000, Pharmacogenomics, V1, P27, DOI 10.1517/14622416.1.1.27
  • [13] Replicating genotype-phenotype associations
    Chanock, Stephen J.
    Manolio, Teri
    Boehnke, Michael
    Boerwinkle, Eric
    Hunter, David J.
    Thomas, Gilles
    Hirschhorn, Joel N.
    Abecasis, Goncalo
    Altshuler, David
    Bailey-Wilson, Joan E.
    Brooks, Lisa D.
    Cardon, Lon R.
    Daly, Mark
    Donnelly, Peter
    Fraumeni, Joseph F., Jr.
    Freimer, Nelson B.
    Gerhard, Daniela S.
    Gunter, Chris
    Guttmacher, Alan E.
    Guyer, Mark S.
    Harris, Emily L.
    Hoh, Josephine
    Hoover, Robert
    Kong, C. Augustine
    Merikangas, Kathleen R.
    Morton, Cynthia C.
    Palmer, Lyle J.
    Phimister, Elizabeth G.
    Rice, John P.
    Roberts, Jerry
    Rotimi, Charles
    Tucker, Margaret A.
    Vogan, Kyle J.
    Wacholder, Sholom
    Wijsman, Ellen M.
    Winn, Deborah M.
    Collins, Francis S.
    [J]. NATURE, 2007, 447 (7145) : 655 - 660
  • [14] Epidemiological methods for studying genes and environmental factors in complex diseases
    Clayton, D
    McKeigue, PM
    [J]. LANCET, 2001, 358 (9290) : 1356 - 1360
  • [15] COLLINS R, 2007, BIOBANK PROTOCOL LAR
  • [16] Transferability of tag SNPs in genetic association studies in multiple populations
    de Bakker, Paul I. W.
    Burtt, Noel P.
    Graham, Robert R.
    Guiducci, Candace
    Yelensky, Roman
    Drake, Jared A.
    Bersaglieri, Todd
    Penney, Kathryn L.
    Butler, Johannah
    Young, Stanton
    Onofrio, Robert C.
    Lyon, Helen N.
    O Stram, Daniel
    Haiman, Christopher A.
    Freedman, Matthew L.
    Zhu, Xiaofeng
    Cooper, Richard
    Groop, Leif
    Kolonel, Laurence N.
    Henderson, Brian E.
    Daly, Mark J.
    Hirschhorn, Joel N.
    Altshuler, David
    [J]. NATURE GENETICS, 2006, 38 (11) : 1298 - 1303
  • [17] Mendelian randomization as an instrumental variable approach to causal inference
    Didelez, Vanessa
    Sheehan, Nuala
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2007, 16 (04) : 309 - 330
  • [18] URINARY ELECTROLYTE EXCRETION IN 24 HOURS AND BLOOD-PRESSURE IN THE INTERSALT STUDY .1. ESTIMATES OF RELIABILITY
    DYER, AR
    SHIPLEY, M
    ELLIOTT, P
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 1994, 139 (09) : 927 - 939
  • [19] Genome-wide association study identifies novel breast cancer susceptibility loci
    Easton, Douglas F.
    Pooley, Karen A.
    Dunning, Alison M.
    Pharoah, Paul D. P.
    Thompson, Deborah
    Ballinger, Dennis G.
    Struewing, Jeffery P.
    Morrison, Jonathan
    Field, Helen
    Luben, Robert
    Wareham, Nicholas
    Ahmed, Shahana
    Healey, Catherine S.
    Bowman, Richard
    Meyer, Kerstin B.
    Haiman, Christopher A.
    Kolonel, Laurence K.
    Henderson, Brian E.
    Le Marchand, Loic
    Brennan, Paul
    Sangrajrang, Suleeporn
    Gaborieau, Valerie
    Odefrey, Fabrice
    Shen, Chen-Yang
    Wu, Pei-Ei
    Wang, Hui-Chun
    Eccles, Diana
    Evans, D. Gareth
    Peto, Julian
    Fletcher, Olivia
    Johnson, Nichola
    Seal, Sheila
    Stratton, Michael R.
    Rahman, Nazneen
    Chenevix-Trench, Georgia
    Bojesen, Stig E.
    Nordestgaard, Borge G.
    Axelsson, Christen K.
    Garcia-Closas, Montserrat
    Brinton, Louise
    Chanock, Stephen
    Lissowska, Jolanta
    Peplonska, Beata
    Nevanlinna, Heli
    Fagerholm, Rainer
    Eerola, Hannaleena
    Kang, Daehee
    Yoo, Keun-Young
    Noh, Dong-Young
    Ahn, Sei-Hyun
    [J]. NATURE, 2007, 447 (7148) : 1087 - U7
  • [20] Complement factor H polymorphism and age-related macular degeneration
    Edwards, AO
    Ritter, R
    Abel, KJ
    Manning, A
    Panhuysen, C
    Farrer, LA
    [J]. SCIENCE, 2005, 308 (5720) : 421 - 424