Likelihood-based inference on haplotype effects in genetic association studies

被引:103
作者
Lin, DY [1 ]
Zeng, D [1 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
基金
美国国家卫生研究院;
关键词
case-control study; gene-environment interaction; Hardy-Weinberg equilibrium; missing data; single nucleotide polymorphism; unphased genotype;
D O I
10.1198/016214505000000808
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A haplotype is a specific sequence of nucleotides on a single chromosome. The population associations between haplotypes and disease phenotypes provide critical information about the genetic basis of complex human diseases. Standard genotyping techniques cannot distinguish the two homologous chromosomes of an individual, so only the unphased genotype (i.e., the combination of the two homologous haplotypes) is directly observable. Statistical inference about haplotype-phenotype associations based on unphased genotype data presents an intriguing missing-data problem, especially when the sampling depends on the disease status. The objective of this article is to provide a systematic and rigorous treatment of this problem. All commonly used study designs. including cross-sectional. case-control, and cohort studies, are considered. The phenotype can be a disease indicator, a quantitative trait. or a potentially censored time-to-disease variable. The effects of haplotypes on the phenotype are formulated through flexible regression models. which can accommodate various genetic mechanisms and gene-environment interactions. Appropriate likelihoods are constructed that may involve high-dimensional parameters. The identifiability of the parameters and the consistency, asymptotic normality, and efficiency of the maximum likelihood estimators are established. Efficient and reliable numerical algorithms are developed. Simulation studies show that the likelihood-based procedures perform well in practical settings. An application to the Finland-United States Investigation of NIDDM Genetics Study is provided. Areas in need of further development are discussed.
引用
收藏
页码:89 / 104
页数:16
相关论文
共 51 条
[1]  
Akaike H., 1998, A Celebration ofStatistics, P387, DOI DOI 10.1007/978-1-4613-8560-8_1
[2]   Haplotypes vs single marker linkage disequilibrium tests:: what do we gain? (Reprinted European Journal of Human Genetics, Vol 4, pg 291-300, 2001) [J].
Akey, Joshua ;
Jin, Li ;
Xiong, Momiao .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2017, 25 :S51-S58
[3]  
[Anonymous], 2002, ANAL LONGITUDINAL DA
[4]  
Bickel Peter J, 1993, Efficient and adaptive estimation for semiparametric models, V4
[5]   Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease [J].
Botstein, D ;
Risch, N .
NATURE GENETICS, 2003, 33 (Suppl 3) :228-237
[6]  
Breslow N, 2003, ANN STAT, V31, P1110
[7]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[8]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[9]  
COX DR, 1972, J R STAT SOC B, V34, P187
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38