Identification of the binding sites of regulatory proteins in bacterial genomes

被引:67
作者
Li, H
Rhodius, V
Gross, C
Siggia, ED
机构
[1] Univ Calif San Francisco, Dept Biochem & Biophys, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Dept Stomatol, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Dept Immunol & Microbiol, San Francisco, CA 94143 USA
[4] Rockefeller Univ, Ctr Studies Phys & Biol, New York, NY 10021 USA
关键词
algorithm; position weight matrix; DNA-binding site; transcription factor; E; coli;
D O I
10.1073/pnas.112341999
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present an algorithm that extracts the binding sites (represented by position-specific weight matrices) for many different transcription factors from the regulatory regions of a genome, without the need for delineating groups of coregulated genes. The algorithm uses the fact that many DNA-binding proteins in bacteria bind to a bipartite motif with two short segments more conserved than the intervening region. It identifies all statistically significant patterns of the form W1NxW2, where W-1 and W-2 are two short oligonuclecitides separated by x arbitrary bases, and groups them into clusters of similar patterns. These clusters are then used to derive quantitative recognition profiles of putative regulatory proteins. For a given cluster, the algorithm finds the matching sequences plus the flanking regions in the genome and performs a multiple sequence alignment to derive position-specific weight matrices. We have analyzed the Escherichia coli genome with this algorithm and found approximate to1,500 significant patterns, which give rise to approximate to160 distinct position-specific weight matrices. A fraction of these matrices match the binding sites of one-third of the approximate to60 characterized transcription factors with high statistical significance. Many of the remaining matrices are likely to describe binding sites and regulons of uncharacterized transcription factors. The significance of these matrices was evaluated by their specificity, the location of the predicted sites, and the biological functions of the corresponding regulons, allowing us to suggest putative regulatory functions. The algorithm is efficient for analyzing newly sequenced bacterial genomes for which little is known about transcriptional regulation.
引用
收藏
页码:11772 / 11777
页数:6
相关论文
共 29 条
[1]  
Bailey T L, 1995, Proc Int Conf Intell Syst Mol Biol, V3, P21
[2]   Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[3]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[4]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[5]   Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10096-10100
[7]  
Courcelle J, 2001, GENETICS, V158, P41
[8]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[9]  
GRALLA JD, 1996, ESCHERICHIA COLI SAL, V2, P1232
[10]   The functional and regulatory roles of sigma factors in transcription [J].
Gross, CA ;
Chan, C ;
Dombroski, A ;
Gruber, T ;
Sharp, M ;
Tupy, J ;
Young, B .
COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY, 1998, 63 :141-155