Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites

被引:64
作者
Qin, ZHS
McCue, LA
Thompson, W
Mayerhofer, L
Lawrence, CE
Liu, JS [1 ]
机构
[1] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
[2] New York State Dept Hlth, Wadsworth Ctr Labs & Res, Albany, NY 12201 USA
[3] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
关键词
D O I
10.1038/nbt802
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes have been developed. The availability of complete genome sequences combined with the likelihood that transcription factors and their cognate sites are often conserved during evolution has led to the development of phylogenetic footprinting(1,2). The modus operandi of this technique is to search for conserved motifs upstream of orthologous genes from closely related species(1,2). The method can identify hundreds of TFBS without prior knowledge of co-regulation or coexpression. Because many of these predicted sites are likely to be bound by the same transcription factor, motifs with similar patterns can be put into clusters so as to infer the sets of co-regulated genes, that is, the regulons. This strategy utilizes only genome sequence information and is complementary to and confirmative of gene expression data generated by microarray experiments. However, the limited data available to characterize individual binding patterns, the variation in motif alignment, motif width, and base conservation, and the lack of knowledge of the number and sizes of regulons make this inference problem difficult. We have developed a Gibbs sampling-based(3) Bayesian motif clustering (BMC) algorithm to address these challenges. Tests on simulated data sets show that BMC produces many fewer errors than hierarchical and K-means clustering methods(4). The application of BMC to hundreds of predicted gamma-proteobacterial motifs(2) correctly identified many experimentally reported regulons, inferred the existence of previously unreported members of these regulons, and suggested novel regulons.
引用
收藏
页码:435 / 439
页数:5
相关论文
共 20 条
  • [11] Bayesian models for multiple local sequence alignment and Gibbs sampling strategies
    Liu, JS
    Neuwald, AF
    Lawrence, CE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (432) : 1156 - 1170
  • [12] Factors influencing the identification of transcription factor binding sites by cross-species comparison
    McCue, LA
    Thompson, W
    Carmack, CS
    Lawrence, CE
    [J]. GENOME RESEARCH, 2002, 12 (10) : 1523 - 1532
  • [13] Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes
    McCue, LA
    Thompson, W
    Carmack, CS
    Ryan, MP
    Liu, JS
    Derbyshire, V
    Lawrence, CE
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (03) : 774 - 782
  • [14] Searching databases of conserved sequence regions by aligning protein multiple-alignments
    Pietrokovski, S
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (19) : 3836 - 3845
  • [15] A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome
    Robison, K
    McGuire, AM
    Church, GM
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 284 (02) : 241 - 254
  • [16] SEQUENCE LOGOS - A NEW WAY TO DISPLAY CONSENSUS SEQUENCES
    SCHNEIDER, TD
    STEPHENS, RM
    [J]. NUCLEIC ACIDS RESEARCH, 1990, 18 (20) : 6097 - 6100
  • [17] Probabilistic clustering of sequences: Inferring new bacterial regulons by comparative genomics
    van Nimwegen, E
    Zavolan, M
    Rajewsky, N
    Siggia, ED
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (11) : 7323 - 7328
  • [18] ANALYSIS OF REGULATORY SEQUENCES UPSTREAM OF THE ESCHERICHIA-COLI UVRB GENE - INVOLVEMENT OF THE DNAA PROTEIN
    VANDENBERG, EA
    GEERSE, RH
    MEMELINK, J
    BOVENBERG, RAL
    MAGNEE, FA
    VANDEPUTTE, P
    [J]. NUCLEIC ACIDS RESEARCH, 1985, 13 (06) : 1829 - 1840
  • [19] A method for direct cloning of Fur-regulated genes:: identification of seven new Fur-regulated loci in Escherichia coli
    Vassinova, N
    Kozyrev, D
    [J]. MICROBIOLOGY-SGM, 2000, 146 : 3171 - 3182
  • [20] Walker G.C., 1996, Escherichia coli and Salmonella