Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites

被引:64
作者
Gershenzon, NI
Stormo, GD
Ioshikhes, IP
机构
[1] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
[2] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
关键词
D O I
10.1093/nar/gki519
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden-Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.
引用
收藏
页码:2290 / 2301
页数:12
相关论文
共 44 条
[11]   A biophysical approach to transcription factor binding site discovery [J].
Djordjevic, M ;
Sengupta, AM ;
Shraiman, BI .
GENOME RESEARCH, 2003, 13 (11) :2381-2390
[12]   Finding genes by computer: The state of the art [J].
Fickett, JW .
TRENDS IN GENETICS, 1996, 12 (08) :316-320
[13]  
Frech K, 1997, TRENDS BIOCHEM SCI, V22, P103
[14]  
GERSHENZON N, 2005, IN PRESS APPL BIOINF, V4
[15]   Synergy of human Pol II core promoter elements revealed by statistical sequence analysis [J].
Gershenzon, NI ;
Ioshikhes, IP .
BIOINFORMATICS, 2005, 21 (08) :1295-1300
[16]   SEARCHING FOR AND PREDICTING THE ACTIVITY OF SITES FOR DNA-BINDING PROTEINS - COMPILATION AND ANALYSIS OF THE BINDING-SITES FOR ESCHERICHIA-COLI INTEGRATION HOST FACTOR (IHF) [J].
GOODRICH, JA ;
SCHWARTZ, ML ;
MCCLURE, WR .
NUCLEIC ACIDS RESEARCH, 1990, 18 (17) :4993-5000
[17]   SEARCH ALGORITHM FOR PATTERN MATCH ANALYSIS OF NUCLEIC-ACID SEQUENCES [J].
HARR, R ;
HAGGSTROM, M ;
GUSTAFSSON, P .
NUCLEIC ACIDS RESEARCH, 1983, 11 (09) :2943-2957
[18]  
HERTZ GZ, 1990, COMPUT APPL BIOSCI, V6, P81
[19]   A non-parametric model for transcription factor binding sites [J].
King, OD ;
Roth, FP .
NUCLEIC ACIDS RESEARCH, 2003, 31 (19) :E116
[20]   Experimentally determined weight matrix definitions of the initiator and TBP binding site elements of promoters [J].
Kraus, RJ ;
Murray, EE ;
Wiley, SR ;
Zink, NM ;
Loritz, K ;
Gelembiuk, GW ;
Mertz, JE .
NUCLEIC ACIDS RESEARCH, 1996, 24 (08) :1531-1539