Classifying short gene expression time-courses with Bayesian estimation of piecewise constant functions

被引:11
作者
Hafemeister, Christoph [1 ,2 ,3 ]
Costa, Ivan G. [4 ]
Schonhuth, Alexander [5 ]
Schliep, Alexander [2 ,3 ]
机构
[1] Max Planck Inst Mol Genet, Dept Computat Mol Biol, Berlin, Germany
[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
[3] Rutgers State Univ, BioMaPS Inst Quantitat Biol, Piscataway, NJ 08854 USA
[4] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
[5] Ctr Wiskunde & Informat, NL-1098 XG Amsterdam, Netherlands
关键词
HIDDEN MARKOV-MODELS; PROFILES; CLASSIFICATION; RESPONSES;
D O I
10.1093/bioinformatics/btr037
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Analyzing short time-courses is a frequent and relevant problem in molecular biology, as, for example, 90% of gene expression time-course experiments span at most nine time-points. The biological or clinical questions addressed are elucidating gene regulation by identification of co-expressed genes, predicting response to treatment in clinical, trial-like settings or classifying novel toxic compounds based on similarity of gene expression time-courses to those of known toxic compounds. The latter problem is characterized by irregular and infrequent sample times and a total lack of prior assumptions about the incoming query, which comes in stark contrast to clinical settings and requires to implicitly perform a local, gapped alignment of time series. The current state-of-the-art method (SCOW) uses a variant of dynamic time warping and models time series as higher order polynomials (splines). Results: We suggest to model time-courses monitoring response to toxins by piecewise constant functions, which are modeled as left-right Hidden Markov Models. A Bayesian approach to parameter estimation and inference helps to cope with the short, but highly multivariate time-courses. We improve prediction accuracy by 7% and 4%, respectively, when classifying toxicology and stress response data. We also reduce running times by at least a factor of 140; note that reasonable running times are crucial when classifying response to toxins. In conclusion, we have demonstrated that appropriate reduction of model complexity can result in substantial improvements both in classification performance and running time.
引用
收藏
页码:946 / 952
页数:7
相关论文
共 31 条
[1]   Analyzing time series gene expression data [J].
Bar-Joseph, Z .
BIOINFORMATICS, 2004, 20 (16) :2493-2503
[2]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[3]   Timing of Gene Expression Responses to Environmental Changes [J].
Chechik, Gal ;
Koller, Daphne .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (02) :279-290
[4]   The Graphical Query Language:: a tool for analysis of gene expression time-courses [J].
Costa, IG ;
Schönhuth, A ;
Schliep, A .
BIOINFORMATICS, 2005, 21 (10) :2544-2545
[5]   Constrained mixture estimation for analysis and robust classification of clinical time series [J].
Costa, Ivan G. ;
Schoenhuth, Alexander ;
Hafemeister, Christoph ;
Schliep, Alexander .
BIOINFORMATICS, 2009, 25 (12) :I6-I14
[6]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[7]  
Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
[8]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210
[9]   Histone deacetylase inhibitor panobinostat induces clinical responses with associated alterations in gene expression profiles in cutaneous T-cell lymphoma [J].
Ellis, Leigh ;
Pan, Yan ;
Smyth, Gordon K. ;
George, Daniel J. ;
McCormack, Chris ;
Williams-Truax, Roxanne ;
Mita, Monica ;
Beck, Joachim ;
Burris, Howard ;
Ryan, Gail ;
Atadja, Peter ;
Butterfoss, Dale ;
Dugan, Margaret ;
Culver, Kenneth ;
Johnstone, Ricky W. ;
Prince, H. Miles .
CLINICAL CANCER RESEARCH, 2008, 14 (14) :4500-4510
[10]   Clustering short time series gene expression data [J].
Ernst, J ;
Nau, GJ ;
Bar-Joseph, Z .
BIOINFORMATICS, 2005, 21 :I159-I168