Finding motifs in promoter regions

被引:26
作者
Hertzberg, L [1 ]
Zuk, O [1 ]
Getz, G [1 ]
Domany, E [1 ]
机构
[1] Weizmann Inst Sci, Dept Phys Complex Syst, IL-76100 Rehovot, Israel
关键词
D O I
10.1089/cmb.2005.12.314
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The availability of whole genome sequences opens the way for computational methods to search for the key elements in transcription regulation. These include methods for discovering the binding sites of DNA-binding proteins, such as transcription factors. A common representation of transcription factor binding sites is a position specific score matrix (PSSM). We developed a probabilistic approach for searching for putative binding sites. Given a promoter sequence and a PSSM, we scan the promoter and find the position with the maximal score. Then we calculate the probability to get such a maximal score or higher on a random promoter. This is the p-value of the putative binding site. In this way, we searched for putative binding sites in the upstream sequences of Saccharomyces cerevisiae, where some binding sites are known ( according to the Saccharomyces cerevisiae Promoters Database, SCPD). Our method produces either exact p-values, or a better estimate for them than other methods, and this improves the results of the search. For each gene we found its statistically significant putative binding sites. We measured the rates of true positives, by a comparison to the known binding sites, and also compared our results to these of MatInspector, a commercially available software that looks for putative binding sites in DNA sequences according to PSSMs. Our results were significantly better. In contrast with us, MatInspector doesn't calculate the exact statistical significance of its results.
引用
收藏
页码:314 / 330
页数:17
相关论文
共 20 条
[1]   Toucan:: deciphering the cis-regulatory logic of coregulated genes [J].
Aerts, S ;
Thijs, G ;
Coessens, B ;
Staes, M ;
Moreau, Y ;
Moor, BD .
NUCLEIC ACIDS RESEARCH, 2003, 31 (06) :1753-1764
[2]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   HOW MANY RANDOM DIGITS ARE REQUIRED UNTIL GIVEN SEQUENCES ARE OBTAINED [J].
BLOM, G ;
THORBURN, D .
JOURNAL OF APPLIED PROBABILITY, 1982, 19 (03) :518-531
[5]   EXHAUSTIVE CRYPT-ANALYSIS OF NBS DATA ENCRYPTION STANDARD [J].
DIFFIE, W ;
HELLMAN, ME .
COMPUTER, 1977, 10 (06) :74-84
[6]   Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells [J].
Elkon, R ;
Linhart, C ;
Sharan, R ;
Shamir, R ;
Shiloh, Y .
GENOME RESEARCH, 2003, 13 (05) :773-780
[7]   THE QR TRANSFORMATION .2. [J].
FRANCIS, JGF .
COMPUTER JOURNAL, 1962, 4 (04) :332-345
[8]  
FRANCIS JGF, 1962, COMPUT J, V4, P135
[9]   Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae [J].
Hughes, JD ;
Estep, PW ;
Tavazoie, S ;
Church, GM .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 296 (05) :1205-1214
[10]   rVista for comparative sequence-based discovery of functional transcription factor binding sites [J].
Loots, GG ;
Ovcharenko, I ;
Pachter, L ;
Dubchak, I ;
Rubin, EM .
GENOME RESEARCH, 2002, 12 (05) :832-839