Robust inference of groups in gene expression time-courses using mixtures of HMMs

被引:19
作者
Schliep, Alexander [1 ]
Steinhoff, Christine [1 ]
Schoenhuth, Alexander [2 ]
机构
[1] Max Planck Inst Mol Genet, Dept Computat Mol Biol, D-14195 Berlin, Germany
[2] Univ Cologne, Ctr Appl Comp Sci, D-50937 Cologne, Germany
关键词
D O I
10.1093/bioinformatics/bth937
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Genetic regulation of cellular processes is frequently investigated using large-scale gene expression experiments to observe changes in expression over time. This temporal data poses a challenge to classical distance-based clustering methods due to its horizontal dependencies along the time-axis. We propose to use hidden Markov models (HMMs) to explicitly model these time-dependencies. The HMMs are used in a mixture approach that we show to be superior over clustering. Furthermore, mixtures are a more realistic model of the biological reality, as an unambiguous partitioning of genes into clusters of unique functional assignment is impossible. Use of the mixture increases robustness with respect to noise and allows an inference of groups at varying level of assignment ambiguity. A simple approach, partially supervised learning, allows to benefit from prior biological knowledge during the training. Our method allows simultaneous analysis of cyclic and non-cyclic genes and copes well with noise and missing values. Results: We demonstrate biological relevance by detection of phase-specific groupings in HeLa time-course data. A benchmark using simulated data, derived using assumptions independent of those in our method, shows very favorable results compared to the baseline supplied by k-means and two prior approaches implementing model-based clustering. The results stress the benefits of incorporating prior knowledge, whenever available.
引用
收藏
页码:283 / 289
页数:7
相关论文
共 28 条
[1]  
[Anonymous], 1998, TR97021 INT COMP SCI
[2]  
Bar-Joseph Z., 2002, P 6 ANN INT C COMP B, P39, DOI DOI 10.1145/565196.565202
[3]  
Belkin M., 2003, THESIS U CHICAGO
[4]  
Blum A., 2001, P 18 INT C MACH LEAR, P19, DOI [DOI 10.1184/R1/6606860.V1, https://doi.org/10.1184/R1/6606860.v1, 10.1184/R1/6606860.v1]
[5]  
BOYLES RA, 1983, J ROY STAT SOC B MET, V45, P47
[6]   ON THE EXPONENTIAL VALUE OF LABELED SAMPLES [J].
CASTELLI, V ;
COVER, TM .
PATTERN RECOGNITION LETTERS, 1995, 16 (01) :105-111
[7]  
Chen T., 1999, P 3 ANN INT C COMP M, P94
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[10]  
Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5