Identifying transcription factor binding sites through Markov chain optimization

被引:71
作者
Ellrott, K [1 ]
Yang, CH
Sladek, FM
Jiang, T
机构
[1] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA
[2] Univ Calif Riverside, Genet Bioinformat Program, Riverside, CA 92521 USA
[3] Univ Calif Riverside, Dept Cell Biol & Neurosci, Riverside, CA 92521 USA
[4] Oak Ridge Natl Lab, Div Life Sci, Prot Informat Grp, Oak Ridge, TN 37831 USA
关键词
D O I
10.1093/bioinformatics/18.suppl_2.S100
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Even though every cell in an organism contains the same genetic material, each cell does not express the same cohort of genes. Therefore, one of the major problems facing genomic research today is to determine not only which genes are differentially expressed and under what conditions, but also how the expression of those genes is regulated. The first step in determining differential gene expression is the binding of sequence-specific DNA binding proteins (i.e. transcription factors) to regulatory regions of the genes (i.e. promoters and enhancers). An important aspect to understanding how a given transcription factor functions is to know the entire gamut of binding sites and subsequently potential target genes that the factor may bind/regulate. In this study, we have developed a computer algorithm to scan genomic databases for transcription factor binding sites, based on a novel Markov chain optimization method, and used it to scan the human genome for sites that bind to hepatocyte nuclear factor 4 alpha (HNF4alpha). A list of 71 known HNF4alpha binding sites from the literature were used to train our Markov chain model. By looking at the window of 600 nucleotides around the transcription start site of each confirmed gene on the human genome, we identified 849 sites with varying binding potential and experimentally tested 109 of those sites for binding to HNF4alpha. Our results show that the program was very successful in identifying 77 new HNF4alpha binding sites with varying binding affinities (i.e. a 71% success rate). Therefore, this computational method for searching genomic databases for potential transcription factor binding sites is a powerful tool for investigating mechanisms of differential gene regulation.
引用
收藏
页码:S100 / S109
页数:10
相关论文
共 34 条
[21]   MatInd and MatInspector: New fast and versatile tools for detection of consensus matches in nucleotide sequence data [J].
Quandt, K ;
Frech, K ;
Karas, H ;
Wingender, E ;
Werner, T .
NUCLEIC ACIDS RESEARCH, 1995, 23 (23) :4878-4884
[22]   Genome-wide location and function of DNA binding proteins [J].
Ren, B ;
Robert, F ;
Wyrick, JJ ;
Aparicio, O ;
Jennings, EG ;
Simon, I ;
Zeitlinger, J ;
Schreiber, J ;
Hannett, N ;
Kanin, E ;
Volkert, TL ;
Wilson, CJ ;
Bell, SP ;
Young, RA .
SCIENCE, 2000, 290 (5500) :2306-+
[23]  
SINHA S, 2000, P 3 INT S MOL BIOL
[24]  
Sladek F. M., 2001, NUCL RECEPTORS GENET, P309
[25]   DNA binding sites: representation and discovery [J].
Stormo, GD .
BIOINFORMATICS, 2000, 16 (01) :16-23
[26]   Hepatocyte nuclear factor-4 regulates intestinal expression of the guanylin/heat-stable toxin receptor [J].
Swenson, ES ;
Mann, EA ;
Jump, ML ;
Giannella, RA .
AMERICAN JOURNAL OF PHYSIOLOGY-GASTROINTESTINAL AND LIVER PHYSIOLOGY, 1999, 276 (03) :G728-G736
[27]   A comparative genomics approach to prediction of new members of regulons [J].
Tan, K ;
Moreno-Hagelsieb, G ;
Collado-Vides, J ;
Stormo, GD .
GENOME RESEARCH, 2001, 11 (04) :566-584
[28]  
Tompa M, 1999, Proc Int Conf Intell Syst Mol Biol, P262
[29]   Promoter sequences and algorithmical methods for identifying them [J].
Vanet, A ;
Marsan, L ;
Sagot, MF .
RESEARCH IN MICROBIOLOGY, 1999, 150 (9-10) :779-799
[30]  
VENTER C, 2001, SCIENCE, V291, P1145