Optimal selection of markers for validation or replication from genome-wide association studies

被引:11
作者
Greenwood, Celia M. T.
Rangrej, Jagadish
Sun, Lei
机构
[1] Univ Toronto, Hosp Sick Children, Toronto, ON M5G 1L7, Canada
[2] Univ Toronto, Dept Publ Hlth Sci, Toronto, ON M5G 1L7, Canada
关键词
false discovery rate; genome-wide; single nucleotide polymorphisms (SNPs); stratification; replication; multi-stage designs;
D O I
10.1002/gepi.20220
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
With reductions in genotyping costs and the fast pace of improvements in genotyping technology, it is not uncommon for the individuals in a single study to undergo genotyping using several different platforms, where each platform may contain different numbers of markers selected via different criteria. For example, a set of cases and controls may be genotyped at markers in a small set of carefully selected candidate genes, and shortly thereafter, the same cases and controls may be used for a genome-wide single nucleotide polymorphism (SNP) association study After such initial investigations, often, a subset of "interesting" markers is selected for validation or replication. Specifically, by validation, we refer to the investigation of associations between the selected subset of markers and the disease in independent data. However, it is not obvious how to choose the best set of markers for this validation. There may be a prior expectation that some sets of genotyping data are more likely to contain real associations. For example, it may be more likely for markers in plausible candidate genes to show disease associations than markers in a genome-wide scan. Hence, it would be desirable to select proportionally more markers from the candidate gene set. When a fixed number of markers are selected for validation, we propose an approach for identifying an optimal marker-selection configuration by basing the approach on minimizing the stratified false discovery rate. We illustrate this approach using a case-control study of colorectal cancer from Ontario, Canada, and we show that this approach leads to substantial reductions in the estimated false discovery rates in the Ontario dataset for the selected markers, as well as reductions in the expected false discovery rates for the proposed validation dataset.
引用
收藏
页码:396 / 407
页数:12
相关论文
共 19 条
[1]   A mixture model approach for the analysis of microarray gene expression data [J].
Allison, DB ;
Gadbury, GL ;
Heo, MS ;
Fernández, JR ;
Lee, CK ;
Prolla, TA ;
Weindruch, R .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 39 (01) :1-20
[2]  
[Anonymous], 2006, R LANG ENV STAT COMP
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[5]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[6]   Empirical Bayes methods and false discovery rates for microarrays [J].
Efron, B ;
Tibshirani, R .
GENETIC EPIDEMIOLOGY, 2002, 23 (01) :70-86
[7]   Large upward bias in estimation of locus-specific effects from genomewide scans [J].
Göring, HHH ;
Terwilliger, JD ;
Blangero, J .
AMERICAN JOURNAL OF HUMAN GENETICS, 2001, 69 (06) :1357-1369
[8]   Detecting differential gene expression with a semiparametric hierarchical mixture method [J].
Newton, MA ;
Noueiry, A ;
Sarkar, D ;
Ahlquist, P .
BIOSTATISTICS, 2004, 5 (02) :155-176
[9]   Using linkage genome scans to improve power of association in genome scans [J].
Roeder, K ;
Bacanu, SA ;
Wasserman, L ;
Devlin, B .
AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (02) :243-252
[10]   Optimal two-stage genotyping in population-based association studies [J].
Satagopan, JM ;
Elston, RC .
GENETIC EPIDEMIOLOGY, 2003, 25 (02) :149-157