Detailing regulatory networks through large scale data integration

被引:48
作者
Huttenhower, Curtis [1 ,2 ]
Mutungu, K. Tsheko [1 ]
Indik, Natasha [1 ]
Yang, Woongcheol [1 ]
Schroeder, Mark [2 ]
Forman, Joshua J. [3 ]
Troyanskaya, Olga G. [1 ,2 ]
Coller, Hilary A. [3 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08540 USA
[2] Princeton Univ, Carl Icahn Lab, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
[3] Princeton Univ, Lewis Thomas Lab, Dept Mol Biol, Princeton, NJ 08544 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
FUNCTIONAL GENOMIC DATA; FACTOR-BINDING SITES; GENE-EXPRESSION DATA; ESCHERICHIA-COLI; TRANSCRIPTION; MODULES; SEQUENCE; YEAST; PROMOTERS; DISCOVERY;
D O I
10.1093/bioinformatics/btp588
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Much of a cell's regulatory response to changing environments occurs at the transcriptional level. Particularly in higher organisms, transcription factors (TFs), microRNAs and epigenetic modifications can combine to form a complex regulatory network. Part of this system can be modeled as a collection of regulatory modules: co-regulated genes, the conditions under which they are co-regulated and sequence-level regulatory motifs. Results: We present the Combinatorial Algorithm for Expression and Sequence-based Cluster Extraction (COALESCE) system for regulatory module prediction. The algorithm is efficient enough to discover expression biclusters and putative regulatory motifs in metazoan genomes (>20 000 genes) and very large microarray compendia (>10 000 conditions). Using Bayesian data integration, it can also include diverse supporting data types such as evolutionary conservation or nucleosome placement. We validate its performance using a functional evaluation of co-clustered genes, known yeast and Escherichea coli TF targets, synthetic data and various metazoan data compendia. In all cases, COALESCE performs as well or better than current biclustering and motif prediction tools, with high accuracy in functional and TF/target assignments and zero false positives on synthetic data. COALESCE provides an efficient and flexible platform within which large, diverse data collections can be integrated to predict metazoan regulatory networks.
引用
收藏
页码:3267 / 3274
页数:8
相关论文
共 38 条
[1]   NCBI GEO: archive for high-throughput functional genomic data [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Rudnev, Dmitry ;
Evangelista, Carlos ;
Kim, Irene F. ;
Soboleva, Alexandra ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Muertter, Rolf N. ;
Edgar, Ron .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D885-D890
[2]   Predicting gene expression from sequence [J].
Beer, MA ;
Tavazoie, S .
CELL, 2004, 117 (02) :185-198
[3]   Learning biological networks: from modules to dynamics [J].
Bonneau, Richard .
NATURE CHEMICAL BIOLOGY, 2008, 4 (11) :658-664
[4]   Coordination of growth rate, cell cycle, stress response, and metabolic activity in yeast [J].
Brauer, Matthew J. ;
Huttenhower, Curtis ;
Airoldi, Edoardo M. ;
Rosenstein, Rachel ;
Matese, John C. ;
Gresham, David ;
Boer, Viktor M. ;
Troyanskaya, Olga G. ;
Botstein, David .
MOLECULAR BIOLOGY OF THE CELL, 2008, 19 (01) :352-367
[5]   Predictive modeling of genome-wide mRNA expression: From modules to molecules [J].
Bussemaker, Harmen J. ;
Foat, Barrett C. ;
Ward, Lucas D. .
ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE, 2007, 36 :329-347
[6]   Identification of thermosensory and olfactory neuron-specific genes via expression profiling of single neuron types [J].
Colosimo, ME ;
Brown, A ;
Mukhopadhyay, S ;
Gabel, C ;
Lanjuin, AE ;
Samuel, ADT ;
Sengupta, P .
CURRENT BIOLOGY, 2004, 14 (24) :2245-2251
[7]   BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis [J].
Durinck, S ;
Moreau, Y ;
Kasprzyk, A ;
Davis, S ;
De Moor, B ;
Brazma, A ;
Huber, W .
BIOINFORMATICS, 2005, 21 (16) :3439-3440
[8]   A universal framework for regulatory element discovery across all Genomes and data types [J].
Elemento, Olivier ;
Slonim, Noam ;
Tavazoie, Saeed .
MOLECULAR CELL, 2007, 28 (02) :337-350
[9]   RegulonDB (version 6.0):: gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation [J].
Gama-Castro, Socorro ;
Jimenez-Jacinto, Veronica ;
Peralta-Gil, Martin ;
Santos-Zavaleta, Alberto ;
Penaloza-Spinola, Monica I. ;
Contreras-Moreira, Bruno ;
Segura-Salazar, Juan ;
Muniz-Rascado, Luis ;
Martinez-Flores, Irma ;
Salgado, Heladia ;
Bonavides-Martinez, Cesar ;
Abreu-Goodger, Cei ;
Rodriguez-Penagos, Carlos ;
Miranda-Rios, Juan ;
Morett, Enrique ;
Merino, Enrique ;
Huerta, Araceli M. ;
Trevino-Quintanilla, Luis ;
Collado-Vides, Julio .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D120-D124
[10]   Allegro: Analyzing expression and sequence in concert to discover regulatory programs [J].
Halperin, Yonit ;
Linhart, Chaim ;
Ulitsky, Igor ;
Shamir, Ron .
NUCLEIC ACIDS RESEARCH, 2009, 37 (05) :1566-1579