Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

被引:153
作者
Houseman, E. Andres [1 ,2 ]
Christensen, Brock C. [2 ]
Yeh, Ru-Fang [3 ]
Marsit, Carmen J. [4 ]
Karagas, Margaret R. [5 ]
Wrensch, Margaret [6 ]
Nelson, Heather H. [7 ]
Wiemels, Joseph [3 ]
Zheng, Shichun [6 ]
Wiencke, John K. [6 ]
Kelsey, Karl T. [2 ,4 ]
机构
[1] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Brown Univ, Dept Community Hlth, Ctr Environm Hlth & Technol, Providence, RI 02912 USA
[3] Univ Calif San Francisco, Dept Epidemiol & Biostat, San Francisco, CA 94143 USA
[4] Brown Univ, Dept Pathol & Lab Med, Providence, RI 02912 USA
[5] Dartmouth Hitchcock Med Ctr, Dept Community & Family Med, Lebanon, NH 03756 USA
[6] Univ Calif San Francisco, Dept Neurol Surg, San Francisco, CA 94143 USA
[7] Univ Minnesota, Sch Publ Hlth, Div Epidemiol & Community Hlth, Minneapolis, MN 55455 USA
关键词
D O I
10.1186/1471-2105-9-365
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. Results: We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. Conclusion: Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.
引用
收藏
页数:15
相关论文
共 28 条
[1]   Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[2]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]   DNA methylation profiling of human chromosomes 6, 20 and 22 [J].
Eckhardt, Florian ;
Lewin, Joern ;
Cortese, Rene ;
Rakyan, Vardhman K. ;
Attwood, John ;
Burger, Matthias ;
Burton, John ;
Cox, Tony V. ;
Davies, Rob ;
Down, Thomas A. ;
Haefliger, Carolina ;
Horton, Roger ;
Howe, Kevin ;
Jackson, David K. ;
Kunde, Jan ;
Koenig, Christoph ;
Liddle, Jennifer ;
Niblett, David ;
Otto, Thomas ;
Pettett, Roger ;
Seemann, Stefanie ;
Thompson, Christian ;
West, Tony ;
Rogers, Jane ;
Olek, Alex ;
Berlin, Kurt ;
Beck, Stephan .
NATURE GENETICS, 2006, 38 (12) :1378-1385
[6]   Epigenetic differences arise during the lifetime of monozygotic twins [J].
Fraga, MF ;
Ballestar, E ;
Paz, MF ;
Ropero, S ;
Setien, F ;
Ballestart, ML ;
Heine-Suñer, D ;
Cigudosa, JC ;
Urioste, M ;
Benitez, J ;
Boix-Chornet, M ;
Sanchez-Aguilera, A ;
Ling, C ;
Carlsson, E ;
Poulsen, P ;
Vaag, A ;
Stephan, Z ;
Spector, TD ;
Wu, YZ ;
Plass, C ;
Esteller, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (30) :10604-10609
[7]  
FRALEY C, 2005, BAYESIAN REGULARIZAT
[8]   Epigenetic remodeling in colorectal cancer results in coordinate gene suppression across an entire chromosome band [J].
Frigola, J ;
Song, J ;
Stirzaker, C ;
Hinshelwood, RA ;
Peinado, MA ;
Clark, SJ .
NATURE GENETICS, 2006, 38 (05) :540-549
[9]  
Hastie T., 2009, The Elements of Statistical Learning, P9
[10]   Feature-specific penalized latent class analysis for genomic data [J].
Houseman, E. Andres ;
Coull, Brent A. ;
Betensky, Rebecca A. .
BIOMETRICS, 2006, 62 (04) :1062-1070