Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals

被引:207
作者
Stram, DO
Pearce, CL
Bretsky, P
Freedman, M
Hirschhorn, JN
Altshuler, D
Kolonel, LN
Henderson, BE
Thomas, DC
机构
[1] Univ So Calif, Dept Prevent Med, Los Angeles, CA 90033 USA
[2] MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA
[3] Harvard Univ, Sch Med, Dept Genet, Cambridge, MA 02138 USA
[4] Harvard Univ, Sch Med, Childrens Hosp, Div Genet, Cambridge, MA 02138 USA
[5] Harvard Univ, Sch Med, Childrens Hosp, Div Endocrinol, Cambridge, MA 02138 USA
[6] Harvard Univ, Sch Med, Dept Med, Cambridge, MA 02138 USA
[7] Massachusetts Gen Hosp, Dept Mol Biol, Boston, MA USA
[8] Massachusetts Gen Hosp, Diabet Unit, Boston, MA USA
[9] Univ Hawaii, Canc Res Ctr Hawaii, Honolulu, HI 96813 USA
关键词
haplotypes; case-control studies; linkage disequilibrium; candidate gene analysis; htSNP;
D O I
10.1159/000073202
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The US National Cancer Institute has recently sponsored the formation of a Cohort Consortium (http://2002.cancer.gov/scpgenes.htm) to facilitate the pooling of data on very large numbers of people, concerning the effects of genes and environment on cancer incidence. One likely goal of these efforts will be generate a large population-based case-control series for which a number of candidate genes will be investigated using SNP haplotype as well as genotype analysis. The goal of this paper is to outline the issues involved in choosing a method of estimating haplotype-specific risk estimates for such data that is technically appropriate and yet attractive to epidemiologists who are already comfortable with odds ratios and logistic regression. Our interest is to develop and evaluate extensions of methods, based on haplotype imputation, that have been recently described (Schaid et al., Am J Hum Genet, 2002, and Zaykin et al., Hum Hered, 2002) as providing score tests of the null hypothesis of no effect of SNP haplotypes upon risk, which may be used for more complex tasks, such as providing confidence intervals, and tests of equivalence of haplotype-specific risks in two or more separate populations. In order to do so we (1) develop a cohort approach towards odds ratio analysis by expanding the E-M algorithm to provide maximum likelihood estimates of haplotype-specific odds ratios as well as genotype frequencies; (2) show how to correct the cohort approach, to give essentially unbiased estimates for population-based or nested case-control studies by incorporating the probability of selection as a case or control into the likelihood, based on a simplified model of case and control selection, and (3) finally, in an example data set (CYP17 and breast cancer, from the Multiethnic Cohort Study) we compare likelihood-based confidence interval estimates from the two methods with each other, and with the use of the single-imputation approach of Zaykin et al. applied under both null and alternative hypotheses. We conclude that so long as haplotypes are well predicted by SNP genotypes (we use the R-h(2) criteria of Stram et al. [1]) the differences h between the three methods are very small and in particular that the single imputation method may be expected to work extremely well. Copyright (C) 2003 S. Karger AG, Basel.
引用
收藏
页码:179 / 190
页数:12
相关论文
共 26 条
[1]   Exposure stratified case-cohort designs [J].
Borgan, O ;
Langholz, B ;
Samuelsen, SO ;
Goldstein, L ;
Pogoda, J .
LIFETIME DATA ANALYSIS, 2000, 6 (01) :39-58
[2]  
COX D, 1979, THEORETICAL STAT, P512
[3]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[4]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]  
EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921
[6]   Genetic analysis of case/control data using estimated haplotype frequencies: Application to APOE locus variation and Alzheimer's disease [J].
Fallin, D ;
Cohen, A ;
Essioux, L ;
Chumakov, I ;
Blumenfeld, M ;
Cohen, D ;
Schork, NJ .
GENOME RESEARCH, 2001, 11 (01) :143-151
[7]   GENOME SCREENING BY SEARCHING FOR SHARED SEGMENTS - MAPPING A GENE FOR BENIGN RECURRENT INTRAHEPATIC CHOLESTASIS [J].
HOUWEN, RHJ ;
BAHARLOO, S ;
BLANKENSHIP, K ;
RAEYMAEKERS, P ;
JUYN, J ;
SANDKUIJL, LA ;
FREIMER, NB .
NATURE GENETICS, 1994, 8 (04) :380-386
[8]  
Kolonel LN, 2000, AM J EPIDEMIOL, V151, P346, DOI 10.1093/oxfordjournals.aje.a010213
[9]  
LOUIS TA, 1982, J ROY STAT SOC B MET, V44, P226
[10]   LOGISTIC DISEASE INCIDENCE MODELS AND CASE-CONTROL STUDIES [J].
PRENTICE, RL ;
PYKE, R .
BIOMETRIKA, 1979, 66 (03) :403-411