Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

被引:126
作者
Han, Buhm [1 ]
Kang, Hyun Min [1 ]
Eskin, Eleazar [2 ,3 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[3] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90024 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
GENOME-WIDE ASSOCIATION; LINKAGE DISEQUILIBRIUM; GENETIC ASSOCIATION; STATISTICAL SIGNIFICANCE; HAPLOTYPE MAP; P-VALUES; INFORMATION; SIMULATION; INFERENCE; GENOTYPES;
D O I
10.1371/journal.pgen.1000456
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies-SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.
引用
收藏
页数:13
相关论文
共 45 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]  
[Anonymous], 1993, Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment
[3]  
[Anonymous], COMP SCI STAT VOL 25
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Estimating coverage and power for genetic association studies using near-complete variation data [J].
Bhangale, Tushar R. ;
Rieder, Mark J. ;
Nickerson, Deborah A. .
NATURE GENETICS, 2008, 40 (07) :841-843
[6]   PRESTO: Rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies [J].
Browning, Brian L. .
BMC BIOINFORMATICS, 2008, 9 (1)
[7]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[8]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[9]   A simple correction for multiple comparisons in interval mapping genome scans [J].
Cheverud, JM .
HEREDITY, 2001, 87 (1) :52-58
[10]   A DNA polymorphism discovery resource for research on human genetic variation [J].
Collins, FS ;
Brooks, LD ;
Chakravarti, A .
GENOME RESEARCH, 1998, 8 (12) :1229-1231