A robust unified approach to analyzing methylation and gene expression data

被引:7
作者
Khalili, Abbas [1 ]
Huang, Tim [2 ]
Lin, Shili [1 ,3 ]
机构
[1] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Comprehens Canc, Div Human Canc Genet, Columbus, OH 43210 USA
[3] Ohio State Univ, Math Biosci Inst, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
MAXIMUM-LIKELIHOOD; MIXTURE; MICROARRAYS; CLASSIFICATION; CONSISTENCY; BAYES;
D O I
10.1016/j.csda.2008.07.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Microarray technology has made it possible to investigate expression levels, and more recently methylation signatures, of thousands of genes simultaneously, in a biological sample. Since more and more data from different biological systems or technological platforms are being generated at an incredible rate, there is an increasing need to develop statistical methods that are applicable to multiple data types and platforms. Motivated by such a need, a flexible finite mixture model that is applicable to methylation, gene expression, and potentially data from other biological systems, is proposed. Two major thrusts of this approach are to allow for a variable number of components in the mixture to capture non-biological variation and small biases, and to use a robust procedure for parameter estimation and probe classification. The method was applied to the analysis of methylation signatures of three breast cancer cell lines. It was also tested on three sets of expression microarray data to study its power and type I error rates. Comparison with a number of existing methods in the literature yielded very encouraging results; lower type I error rates and comparable/better power were achieved based on the limited study. Furthermore, the method also leads to more biologically interpretable results for the three breast cancer cell lines. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:1701 / 1710
页数:10
相关论文
共 32 条
[1]  
Ahuja N, 1998, CANCER RES, V58, P5489
[2]  
Akaike H., 1973, 2 INT S INFORM THEOR, P267
[3]   On the consistency of MLE in finite mixture models of exponential families [J].
Atienza, N. ;
Garcia-Heras, J. ;
Munoz-Pichardo, J. M. ;
Villa-Caro, R. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2007, 137 (02) :496-505
[4]   A Laplace mixture model for identification of differential expression in microarray experiments [J].
Bhowmick, Debjani ;
Davison, A. C. ;
Goldstein, Darlene R. ;
Ruffieux, Yann .
BIOSTATISTICS, 2006, 7 (04) :630-641
[5]   Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations [J].
Bickel, PJ ;
Levina, E .
BERNOULLI, 2004, 10 (06) :989-1010
[6]  
Chen J., 2006, ORDER SELECTION FINI
[7]   Normal uniform mixture differential gene expression detection for cDNA microarrays [J].
Dean, N ;
Raftery, AE .
BMC BIOINFORMATICS, 2005, 6 (1)
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[10]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87