An integrated machine learning approach for predicting DosR-regulated genes in Mycobacterium tuberculosis

被引:5
作者
Zhang, Yi [1 ]
Hatch, Kim A. [2 ]
Bacon, Joanna [2 ]
Wernisch, Lorenz [1 ,3 ]
机构
[1] Univ London Birkbeck Coll, Sch Crystallog, London WC1E 7HX, England
[2] TB Res, Hlth Protect Agcy, CEPR, Salisbury SP4 0JG, Wilts, England
[3] Univ Forvie Site, MRC, Biostat Unit, Cambridge CB2 0SR, England
基金
英国惠康基金;
关键词
NETWORK COMPONENT ANALYSIS; TRANSCRIPTION FACTOR; HYPOXIC RESPONSE; EXPRESSION; RECONSTRUCTION; MICROARRAY;
D O I
10.1186/1752-0509-4-37
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: DosR is an important regulator of the response to stress such as limited oxygen availability in Mycobacterium tuberculosis. Time course gene expression data enable us to dissect this response on the gene regulatory level. The mRNA expression profile of a regulator, however, is not necessarily a direct reflection of its activity. Knowing the transcription factor activity (TFA) can be exploited to predict novel target genes regulated by the same transcription factor. Various approaches have been proposed to reconstruct TFAs from gene expression data. Most of them capture only a first-order approximation to the complex transcriptional processes by assuming linear gene responses and linear dynamics in TFA, or ignore the temporal information in data from such systems. Results: In this paper, we approach the problem of inferring dynamic hidden TFAs using Gaussian processes (GP). We are able to model dynamic TFAs and to account for both linear and nonlinear gene responses. To test the validity of the proposed approach, we reconstruct the hidden TFA of p53, a tumour suppressor activated by DNA damage, using published time course gene expression data. Our reconstructed TFA is closer to the experimentally determined profile of p53 concentration than that from the original study. We then apply the model to time course gene expression data obtained from chemostat cultures of M. tuberculosis under reduced oxygen availability. After estimation of the TFA of DosR based on a number of known target genes using the GP model, we predict novel DosR-regulated genes: the parameters of the model are interpreted as relevance parameters indicating an existing functional relationship between TFA and gene expression. We further improve the prediction by integrating promoter sequence information in a logistic regression model. Apart from the documented DosR-regulated genes, our prediction yields ten novel genes under direct control of DosR. Conclusions: Chemostat cultures are an ideal experimental system for controlling noise and variability when monitoring the response of bacterial organisms such as M. tuberculosis to finely controlled changes in culture conditions and available metabolites. Nonlinear hidden TFA dynamics of regulators can be reconstructed remarkably well with Gaussian processes from such data. Moreover, estimated parameters of the GP can be used to assess whether a gene is controlled by the reconstructed TFA or not. It is straightforward to combine these parameters with further information, such as the presence of binding motifs, to increase prediction accuracy.
引用
收藏
页数:11
相关论文
共 22 条
[1]  
[Anonymous], 1994, Models of Neural Networks III: Association, Generalization, and Representation
[2]   The influence of reduced oxygen availability on pathogenicity and gene expression in Mycobacterium tuberculosis [J].
Bacon, J ;
James, BW ;
Wernisch, L ;
Williams, A ;
Morley, KA ;
Hatch, GJ ;
Mangan, JA ;
Hinds, J ;
Stoker, NG ;
Butcher, PD ;
Marsh, PD .
TUBERCULOSIS, 2004, 84 (3-4) :205-217
[3]  
BACON J, 2007, MICROBIOLOGY, V7, P277
[4]   Transcription and autoregulation of the Rv3134c-devR-devS operon of Mycobacterium tuberculosis [J].
Bagchi, G ;
Chauhan, S ;
Sharma, D ;
Tyagi, JS .
MICROBIOLOGY-SGM, 2005, 151 :4045-4053
[5]   Ranked prediction of p53 targets using hidden variable dynamic modeling [J].
Barenco, M ;
Tomescu, D ;
Brewer, D ;
Callard, R ;
Stark, J ;
Hubank, M .
GENOME BIOLOGY, 2006, 7 (03)
[6]   Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach [J].
Boulesteix, Anne-Laure ;
Strimmer, Korbinian .
THEORETICAL BIOLOGY AND MEDICAL MODELLING, 2005, 2
[7]   Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data [J].
Chang, Chunqi ;
Ding, Zhi ;
Hung, Yeung Sam ;
Fung, Peter Chin Wan .
BIOINFORMATICS, 2008, 24 (11) :1349-1358
[8]   The Mycobacterium tuberculosis dosRS two-component system is induced by multiple stresses [J].
Kendall, SL ;
Movahedzadeh, F ;
Rison, SCG ;
Wernisch, L ;
Parish, T ;
Duncan, K ;
Betts, JC ;
Stoker, NG .
TUBERCULOSIS, 2004, 84 (3-4) :247-255
[9]  
LAWRENCE ND, 2004, NIPS 2004
[10]   Network component analysis: Reconstruction of regulatory signals in biological systems [J].
Liao, JC ;
Boscolo, R ;
Yang, YL ;
Tran, LM ;
Sabatti, C ;
Roychowdhury, VP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (26) :15522-15527