Empirical Bayes analysis of single nucleotide polymorphisms

被引:15
作者
Schwender, Holger [1 ]
Ickstadt, Katja [1 ]
机构
[1] Dortmund Univ Technol, Collaborat Res Ctr 475, Fac Stat, D-44221 Dortmund, Germany
关键词
D O I
10.1186/1471-2105-9-144
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors. Results: In this paper, we propose a modification of this empirical Bayes analysis that can be used to analyze high-dimensional categorical SNP data. This approach along with a generalized version of the original empirical Bayes method are available in the R package siggenes version 1.10.0 and later that can be downloaded from http://www.bioconductor.org. Conclusion: As applications to two subsets of the HapMap data show, the empirical Bayes analysis of microarrays cannot only be used to analyze continuous gene expression data, but also be applied to categorical SNP data, where the response is not restricted to be binary. In association studies in which typically several ten to a few hundred SNPs are considered, our approach can furthermore be employed to test interactions of SNPs. Moreover, the posterior probabilities resulting from the empirical Bayes analysis of (prespecified) interactions/genotypes can also be used to quantify the importance of these interactions.
引用
收藏
页数:15
相关论文
共 47 条
[1]  
*AFF, MAPP 500 5 GEN CALLS
[2]  
Affymetrix, 2006, BRLMM IMPR GEN CALL
[3]  
[Anonymous], 1993, Resampling-based multiple testing: Examples and methods for P-value adjustment
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Robust estimators of the mode and skewness of continuous data [J].
Bickel, DR .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 39 (02) :153-163
[6]  
Boulesteix AL, 2007, STAT APPL GENET MOL, V6
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data [J].
Carvalho, Benilton ;
Bengtsson, Henrik ;
Speed, Terence P. ;
Irizarry, Rafael A. .
BIOSTATISTICS, 2007, 8 (02) :485-499
[9]   Probability density function estimation using gamma kernels [J].
Chen, SX .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2000, 52 (03) :471-480
[10]   SOME METHODS FOR STRENGTHENING THE COMMON X2 TESTS [J].
COCHRAN, WG .
BIOMETRICS, 1954, 10 (04) :417-451