info-gibbs: a motif discovery algorithm that directly optimizes information content during sampling

被引:20
作者
Defrance, Matthieu [1 ]
van Helden, Jacques [1 ]
机构
[1] Univ Libre Bruxelles, Lab Bioinformat Genomes & Reseaux BiGRe, B-1050 Brussels, Belgium
关键词
FACTOR-BINDING SITES; SEQUENCE-ANALYSIS TOOLS; REGULATORY ELEMENTS; STATISTICAL OVERREPRESENTATION; SACCHAROMYCES-CEREVISIAE; NONCODING SEQUENCES; GENETIC ALGORITHM; TRANSCRIPTION; DNA; IDENTIFICATION;
D O I
10.1093/bioinformatics/btp490
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics rather than during the motif search process itself. Results: We introduce here info-gibbs, a Gibbs sampling algorithm that efficiently optimizes the IC or the log-likelihood ratio (LLR) of the motif while keeping computation time low. The method compares well with existing methods like MEME, BioProspector, Gibbs or GAME on both synthetic and biological datasets. Our study shows that motif discovery techniques can be enhanced by directly focusing the search on the motif IC or the motif LLR.
引用
收藏
页码:2715 / 2722
页数:8
相关论文
共 34 条
[1]
Bailey T. L., 1994, Proc. Int. Conf. Intell. Syst. Mol. Biol., V2, P28
[2]
TFBS identification based on genetic algorithm with combined representations and adaptive post-processing [J].
Chan, Tak-Ming ;
Leung, Kwong-Sak ;
Lee, Kin-Hong .
BIOINFORMATICS, 2008, 24 (03) :341-349
[3]
WebLogo: A sequence logo generator [J].
Crooks, GE ;
Hon, G ;
Chandonia, JM ;
Brenner, SE .
GENOME RESEARCH, 2004, 14 (06) :1188-1190
[4]
RegulonDB (version 6.0):: gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation [J].
Gama-Castro, Socorro ;
Jimenez-Jacinto, Veronica ;
Peralta-Gil, Martin ;
Santos-Zavaleta, Alberto ;
Penaloza-Spinola, Monica I. ;
Contreras-Moreira, Bruno ;
Segura-Salazar, Juan ;
Muniz-Rascado, Luis ;
Martinez-Flores, Irma ;
Salgado, Heladia ;
Bonavides-Martinez, Cesar ;
Abreu-Goodger, Cei ;
Rodriguez-Penagos, Carlos ;
Miranda-Rios, Juan ;
Morett, Enrique ;
Merino, Enrique ;
Huerta, Araceli M. ;
Trevino-Quintanilla, Luis ;
Collado-Vides, Julio .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D120-D124
[5]
Transcriptional regulatory code of a eukaryotic genome [J].
Harbison, CT ;
Gordon, DB ;
Lee, TI ;
Rinaldi, NJ ;
Macisaac, KD ;
Danford, TW ;
Hannett, NM ;
Tagne, JB ;
Reynolds, DB ;
Yoo, J ;
Jennings, EG ;
Zeitlinger, J ;
Pokholok, DK ;
Kellis, M ;
Rolfe, PA ;
Takusagawa, KT ;
Lander, ES ;
Gifford, DK ;
Fraenkel, E ;
Young, RA .
NATURE, 2004, 431 (7004) :99-104
[6]
Identifying DNA and protein patterns with statistically significant alignments of multiple sequences [J].
Hertz, GZ ;
Stormo, GD .
BIOINFORMATICS, 1999, 15 (7-8) :563-577
[7]
HERTZ GZ, 1990, COMPUT APPL BIOSCI, V6, P81
[8]
Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae [J].
Hughes, JD ;
Estep, PW ;
Tavazoie, S ;
Church, GM .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 296 (05) :1205-1214
[9]
Computational discovery of gene regulatory binding motifs: A Bayesian perspective [J].
Jensen, ST ;
Liu, XS ;
Zhou, Q ;
Liu, JS .
STATISTICAL SCIENCE, 2004, 19 (01) :188-204
[10]
BioOptimizer: a Bayesian scoring function approach to motif discovery [J].
Jensen, ST ;
Liu, JS .
BIOINFORMATICS, 2004, 20 (10) :1557-1564