Similarity of position frequency matrices for transcription factor binding sites

被引:78
作者
Schones, DE
Sumazin, P
Zhang, MQ
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[2] SUNY Stony Brook, Dept Phys & Astron, Stony Brook, NY 11794 USA
[3] Portland State Univ, Dept Comp Sci, Portland, OR 97207 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bth480
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Transcription-factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. Results: We describe a PFM similarity quantification method based on product multinomial distributions, demonstrate its ability to identify PFM similarity and show that it has a better false positive to false negative ratio compared to existing methods. We grouped TFBS frequency matrices from two libraries into matrix families and identified the matrices that are common and unique to these libraries. We identified similarities and differences between the skeletal-muscle-specific and non-muscle-specific frequency matrices for the binding sites of Mef-2, Myf, Sp-1, SRF and TEF of Wasserman and Fickett. We further identified known frequency matrices and matrix families that were strongly similar to the matrices given by Wasserman and Fickett. We provide methodology and tools to compare and query libraries of frequency matrices for TFBSs.
引用
收藏
页码:307 / 313
页数:7
相关论文
共 24 条
[1]  
Agresti A., 1992, STAT SCI, V7, P131, DOI DOI 10.1214/SS/1177011454
[2]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS .2. THE BINDING-SPECIFICITY OF CYCLIC-AMP RECEPTOR PROTEIN TO RECOGNITION SITES [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 200 (04) :709-723
[3]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[4]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[5]  
Fleiss JL, 2013, STAT METHODS RATES P
[6]   Identifying DNA and protein patterns with statistically significant alignments of multiple sequences [J].
Hertz, GZ ;
Stormo, GD .
BIOINFORMATICS, 1999, 15 (7-8) :563-577
[7]  
HERTZ GZ, 1990, COMPUT APPL BIOSCI, V6, P81
[8]   Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae [J].
Hughes, JD ;
Estep, PW ;
Tavazoie, S ;
Church, GM .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 296 (05) :1205-1214
[9]  
Kaufman L., 1990, FINDING GROUPS DATA
[10]  
Knuppel R, 1994, J Comput Biol, V1, P191, DOI 10.1089/cmb.1994.1.191