Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies

被引:477
作者
Jaffe, Andrew E. [1 ,2 ,3 ]
Murakami, Peter [3 ]
Lee, Hwajin [3 ]
Leek, Jeffrey T. [1 ]
Fallin, M. Daniele [1 ,2 ,3 ,4 ]
Feinberg, Andrew P. [1 ,3 ,4 ]
Irizarry, Rafael A. [1 ,3 ]
机构
[1] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
[2] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Epidemiol, Baltimore, MD 21205 USA
[3] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[4] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
关键词
Epigenetic epidemiology; DNA methylation; genome-wide analysis; bump hunting; batch effects; SURROGATE VARIABLE ANALYSIS; PLURIPOTENT STEM-CELLS; CPG ISLAND SHORES; DNA METHYLATION; GENE-EXPRESSION; CANCER; DISEASE; ARRAYS; MODE; CHIP;
D O I
10.1093/ije/dyr238
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background During the past 5 years, high-throughput technologies have been successfully used by epidemiology studies, but almost all have focused on sequence variation through genome-wide association studies (GWAS). Today, the study of other genomic events is becoming more common in large-scale epidemiological studies. Many of these, unlike the single-nucleotide polymorphism studied in GWAS, are continuous measures. In this context, the exercise of searching for regions of interest for disease is akin to the problems described in the statistical 'bump hunting' literature. Methods New statistical challenges arise when the measurements are continuous rather than categorical, when they are measured with uncertainty, and when both biological signal, and measurement errors are characterized by spatial correlation along the genome. Perhaps the most challenging complication is that continuous genomic data from large studies are measured throughout long periods, making them susceptible to ` batch effects'. An example that combines all three characteristics is genome-wide DNA methylation measurements. Here, we present a data analysis pipeline that effectively models measurement error, removes batch effects, detects regions of interest and attaches statistical uncertainty to identified regions. Results We illustrate the usefulness of our approach by detecting genomic regions of DNA methylation associated with a continuous trait in a well-characterized population of newborns. Additionally, we show that addressing unexplained heterogeneity like batch effects reduces the number of false-positive regions. Conclusions Our framework offers a comprehensive yet flexible approach for identifying genomic regions of biological interest in large epidemiological studies using quantitative high-throughput methods.
引用
收藏
页码:200 / 209
页数:10
相关论文
共 45 条
[1]   Determinants of fetal exposure to polyfluoroalkyl compounds in Baltimore, Maryland [J].
Apelberg, Benjamin J. ;
Goldman, Lynn R. ;
Calafat, Antonia M. ;
Herbstman, Julie B. ;
Kuklenyik, Zsuzsanna ;
Heidler, Jochen ;
Needham, Larry L. ;
Halden, Rolf U. ;
Witter, Frank R. .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2007, 41 (11) :3891-3897
[2]   A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization [J].
Arking, Dan E. ;
Pfeufer, Arne ;
Post, Wendy ;
Kao, W. H. Linda ;
Newton-Cheh, Christopher ;
Ikeda, Morna ;
West, Kristen ;
Kashuk, Carl ;
Akyol, Mahmut ;
Perz, Siegfried ;
Jalilzadeh, Shapour ;
Illig, Thomas ;
Gieger, Christian ;
Guo, Chao-Yu ;
Larson, Martin G. ;
Wichmann, H. Erich ;
Marban, Eduardo ;
O'Donnell, Christopher J. ;
Hirschhorn, Joel N. ;
Kaeaeb, Stefan ;
Spooner, Peter M. ;
Meitinger, Thomas ;
Chakravarti, Aravinda .
NATURE GENETICS, 2006, 38 (06) :644-651
[3]   Accurate genome-scale percentage DNA methylation estimates from microarray data [J].
Aryee, Martin J. ;
Wu, Zhijin ;
Ladd-Acosta, Christine ;
Herb, Brian ;
Feinberg, Andrew P. ;
Yegnasubramanian, Srinivasan ;
Irizarry, Rafael A. .
BIOSTATISTICS, 2011, 12 (02) :197-210
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   High density DNA methylation array with single CpG site resolution [J].
Bibikova, Marina ;
Barnes, Bret ;
Tsan, Chan ;
Ho, Vincent ;
Klotzle, Brandy ;
Le, Jennie M. ;
Delano, David ;
Zhang, Lu ;
Schroth, Gary P. ;
Gunderson, Kevin L. ;
Fan, Jian-Bing ;
Shen, Richard .
GENOMICS, 2011, 98 (04) :288-295
[6]   DNA METHYLATION INHIBITS TRANSCRIPTION INDIRECTLY VIA A METHYL-CPG BINDING-PROTEIN [J].
BOYES, J ;
BIRD, A .
CELL, 1991, 64 (06) :1123-1134
[7]   ROBUST LOCALLY WEIGHTED REGRESSION AND SMOOTHING SCATTERPLOTS [J].
CLEVELAND, WS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1979, 74 (368) :829-836
[8]  
Cloud J., 2010, TIME Magazine
[9]   Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts [J].
Doi, Akiko ;
Park, In-Hyun ;
Wen, Bo ;
Murakami, Peter ;
Aryee, Martin J. ;
Irizarry, Rafael ;
Herb, Brian ;
Ladd-Acosta, Christine ;
Rho, Junsung ;
Loewer, Sabine ;
Miller, Justine ;
Schlaeger, Thorsten ;
Daley, George Q. ;
Feinberg, Andrew P. .
NATURE GENETICS, 2009, 41 (12) :1350-U123
[10]   DNA methylation profiling of human chromosomes 6, 20 and 22 [J].
Eckhardt, Florian ;
Lewin, Joern ;
Cortese, Rene ;
Rakyan, Vardhman K. ;
Attwood, John ;
Burger, Matthias ;
Burton, John ;
Cox, Tony V. ;
Davies, Rob ;
Down, Thomas A. ;
Haefliger, Carolina ;
Horton, Roger ;
Howe, Kevin ;
Jackson, David K. ;
Kunde, Jan ;
Koenig, Christoph ;
Liddle, Jennifer ;
Niblett, David ;
Otto, Thomas ;
Pettett, Roger ;
Seemann, Stefanie ;
Thompson, Christian ;
West, Tony ;
Rogers, Jane ;
Olek, Alex ;
Berlin, Kurt ;
Beck, Stephan .
NATURE GENETICS, 2006, 38 (12) :1378-1385