Inferring regulatory elements from a whole genome.: An analysis of Helicobacter pylori σ80 family of promoter signals

被引:62
作者
Vanet, A
Marsan, L
Labigne, A
Sagot, MF
机构
[1] Inst Pasteur, Serv Informat Sci, F-75724 Paris, France
[2] Inst Biol Physicochim, CNRS, UPR 9073, F-75005 Paris, France
[3] Inst Pasteur, Unite Pathogenie Bacterienne Musqueuses, F-75724 Paris 15, France
[4] Inst Gaspard Monge, Marne la Vallee, France
关键词
combined motif; description inference; promoter; Helicobacter pylori; prokaryotes;
D O I
10.1006/jmbi.2000.3576
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Helicobacter pylori is adapted to life in a unique niche, the gastric epithelium of primates. Its promoters may therefore be different from those of other bacteria. Here, we determine motifs possibly involved in the recognition of such promoter sequences by the RNA polymerase using a new motif identification method. An important feature of this method is that the motifs are sought with the least possible assumptions about what they may look like. The method starts by considering the whole genome of H. pylori and attempts to infer directly from it a description for a family of promoters. Thus, this approach differs from searching for such promoters with a previously established description. The two algorithms are based on the idea of inferring motifs by flexibly comparing words in the sequences with an external object, instead of between themselves. The first algorithm infers single motifs, the second a combination of two motifs separated from one another by strictly defined, sterically constrained distances. Besides independently finding motifs known to be present in other bacteria, such as the Shine-Dalgarno sequence and the TATA-box, this approach suggests the existence in H. pylori of a new, combined motif, TTAAGC, followed optimally 21 bp downstream by TATAAT. Between these two motifs, there is in some cases another, TTTTAA or, less frequently, a repetition of TTAAGC separated optimally from the TATA-box by 12 bp. The combined motif TTAAGC x (21 +/- 2)TATAAT is present with no errors immediately upstream from the only two copies of the ribosomal 23 S-5 S RNA genes in H. pylori, and with one error upstream from the only two copies of the ribosomal 16 S RNA genes. The operons of both ribosomal RNA molecules are strongly expressed, representing an encouraging sign of the pertinence of the motifs found by the algorithms. In 25 cases out of a possible 30, the combined motif is found with no more than three substitutions immediately upstream from ribosomal proteins, or operons containing a ribosomal protein. This is roughly the same frequency of occurrence as for TTGACA x (15-19)TATAAT (with the same maximum number of substitutions allowed) described as being the sigma(70) promoter sequence consensus in Bacillus subtilis and Escherichia coli. The frequency of occurrence of the new motif obtained, TTAAGC x (19-23)TATAAT, remains high when all protein genes in H. pylori are considered, as is the case for the TTGACA x (15-19)TATAAT motif in B. subtilis but not in E. coli. (C) 2000 Academic Press.
引用
收藏
页码:335 / 353
页数:19
相关论文
共 45 条
[1]  
[Anonymous], 1993, ART SCI COMPUTING
[2]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[3]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[4]   Region 2.5 of the Escherichia coli RNA polymerase sigma(70) subunit is responsible for the recognition of the 'extended -10' motif at promoters [J].
Barne, KA ;
Bown, JA ;
Busby, SJW ;
Minchin, SD .
EMBO JOURNAL, 1997, 16 (13) :4034-4040
[5]   Functional analysis of the Helicobacter pylori principal sigma subunit of RNA polymerase reveals that the spacer region is important for efficient transcription [J].
Beier, D ;
Spohn, G ;
Rappuoli, R ;
Scarlato, V .
MOLECULAR MICROBIOLOGY, 1998, 30 (01) :121-134
[6]   HELICOBACTER-PYLORI - ITS ROLE IN DISEASE [J].
BLASER, MJ .
CLINICAL INFECTIOUS DISEASES, 1992, 15 (03) :386-393
[7]   EXPECTATION MAXIMIZATION ALGORITHM FOR IDENTIFYING PROTEIN-BINDING SITES WITH VARIABLE LENGTHS FROM UNALIGNED DNA FRAGMENTS [J].
CARDON, LR ;
STORMO, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 223 (01) :159-170
[8]  
CHEN QK, 1995, COMPUT APPL BIOSCI, V11, P563
[9]  
CORREA P, 1995, AM J SURG PATHOL, V19, pS37
[10]   A statistical model for locating regulatory regions in genomic DNA [J].
Crowley, EM ;
Roeder, K ;
Bina, M .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :8-14