Modeling within-motif dependence for transcription factor binding site predictions

被引:102
作者
Zhou, Q [1 ]
Liu, JS [1 ]
机构
[1] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bth006
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein-DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit this interdependence to aid motif discovery, we extend the PWM model to include pairs of correlated positions and design a Markov chain Monte Carlo algorithm to sample in the model space. We then combine the model sampling step with the Gibbs sampling framework for de novo motif discoveries. Results: Testing on experimentally validated binding sites, we find that about 25% of the transcription factor binding motifs show significant within-site position correlations, and 80% of these motif models can be improved by considering the correlated positions. Using both simulated data and real promoter sequences, we show that the new de novo motif-finding algorithm can infer the true correlated position pairs accurately and is more precise in finding putative transcription factor binding sites than the standard Gibbs sampling algorithms.
引用
收藏
页码:909 / 916
页数:8
相关论文
共 23 条
[1]  
[Anonymous], ISMB
[2]  
BARASH Y, 2003, RECOMB 03
[3]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[4]   Probabilistic code for DNA recognition by proteins of the EGR family [J].
Benos, PV ;
Lapedes, AS ;
Stormo, GD .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 323 (04) :701-727
[5]   Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors [J].
Bulyk, ML ;
Johnson, PLF ;
Church, GM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1255-1261
[6]   RIGOROUS PATTERN-RECOGNITION METHODS FOR DNA-SEQUENCES - ANALYSIS OF PROMOTER SEQUENCES FROM ESCHERICHIA-COLI [J].
GALAS, DJ ;
EGGERT, M ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1985, 186 (01) :117-128
[7]  
Grundy WN, 1996, COMPUT APPL BIOSCI, V12, P303
[8]   Discovery of conserved sequence patterns using a stochastic dictionary model [J].
Gupta, M ;
Liu, JS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (461) :55-66
[9]   Role for E2F in control of both DNA replication and mitotic functions as revealed from DNA microarray analysis [J].
Ishida, S ;
Huang, E ;
Zuzan, H ;
Spang, R ;
Leone, G ;
West, M ;
Nevins, JR .
MOLECULAR AND CELLULAR BIOLOGY, 2001, 21 (14) :4684-4699
[10]   AN EXPECTATION MAXIMIZATION (EM) ALGORITHM FOR THE IDENTIFICATION AND CHARACTERIZATION OF COMMON SITES IN UNALIGNED BIOPOLYMER SEQUENCES [J].
LAWRENCE, CE ;
REILLY, AA .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1990, 7 (01) :41-51