A non-parametric model for transcription factor binding sites

被引:42
作者
King, OD [1 ]
Roth, FP [1 ]
机构
[1] Harvard Univ, Sch Med, Dept Biol Chem & Mol Pharmacol, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/nar/gng117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We introduce a non-parametric representation of transcription factor binding sites which can model arbitrary dependencies between positions. As two parameters are varied, this representation smoothly interpolates between the empirical distribution of binding sites and the standard position-specific scoring matrix (PSSM). In a test of generalization to unseen binding sites using 10-fold cross-validation on known binding sites for 95 TRANSFAC transcription factors, this representation outperforms PSSMs on between 65 and 89 of the 95 transcription factors, depending on the choice of the two adjustable parameters. We also discuss how the non- parametric representation may be incorporated into frameworks for finding binding sites given only a collection of unaligned promoter regions.
引用
收藏
页数:8
相关论文
共 31 条
[1]  
AGARWAL P, 1998, 2 ANN C RES COMP MOL, P2
[2]  
BARASH Y, 2003, IN PRESS P 7 ANN INT
[3]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[4]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[5]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[6]   Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors [J].
Bulyk, ML ;
Johnson, PLF ;
Church, GM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1255-1261
[7]  
CHEN QK, 1995, COMPUT APPL BIOSCI, V11, P563
[8]  
CHRISTENSEN SW, 2003, J MACHINE LEARNING R, V4, P39
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130