Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites

被引:64
作者
Gershenzon, NI
Stormo, GD
Ioshikhes, IP
机构
[1] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
[2] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
关键词
D O I
10.1093/nar/gki519
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden-Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.
引用
收藏
页码:2290 / 2301
页数:12
相关论文
共 44 条
[1]   NUMBER OF CPG ISLANDS AND GENES IN HUMAN AND MOUSE [J].
ANTEQUERA, F ;
BIRD, A .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1993, 90 (24) :11995-11999
[2]  
Barash Y., 2003, P 7 ANN INT C COMP M, P28
[3]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[4]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[5]   WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES [J].
BUCHER, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) :563-578
[6]   Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors [J].
Bulyk, ML ;
Johnson, PLF ;
Church, GM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1255-1261
[7]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[8]   Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs [J].
Cawley, S ;
Bekiranov, S ;
Ng, HH ;
Kapranov, P ;
Sekinger, EA ;
Kampa, D ;
Piccolboni, A ;
Sementchenko, V ;
Cheng, J ;
Williams, AJ ;
Wheeler, R ;
Wong, B ;
Drenkow, J ;
Yamanaka, M ;
Patel, S ;
Brubaker, S ;
Tammana, H ;
Helt, G ;
Struhl, K ;
Gingeras, TR .
CELL, 2004, 116 (04) :499-509
[9]  
Claverie JM, 1996, COMPUT APPL BIOSCI, V12, P431
[10]   THRESHOLD CONSENSUS METHODS FOR MOLECULAR SEQUENCES [J].
DAY, WHE ;
MCMORRIS, FR .
JOURNAL OF THEORETICAL BIOLOGY, 1992, 159 (04) :481-489