A genome-wide survey of short coding sequences in streptococci

被引:62
作者
Ibrahim, Mariam
Nicolas, Pierre
Bessieres, Philippe
Bolotin, Alexander
Monnet, Veronique
Gardan, Rozenn [1 ]
机构
[1] INRA, UR477, Unite Biochim Bacterienne, F-78350 Jouy En Josas, France
[2] INRA, UR10777, Unite Math Informat & Genome, F-78350 Jouy En Josas, France
[3] INRA, UR895, Unite Genet Microbienne, F-78350 Jouy En Josas, France
来源
MICROBIOLOGY-SGM | 2007年 / 153卷
关键词
D O I
10.1099/mic.0.2007/006205-0
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Identification of short genes that encode peptides of fewer than 60 aa is challenging, both experimentally and in silico. As a consequence, the universe of these short coding sequences (CDSs) remains largely unknown, although some are acknowledged to play important roles in cell-cell communication, particularly in Gram-positive bacteria. This paper reports a thorough search for short CDSs across streptococcal genomes. Our bioinformatic approach relied on a combination of advanced intrinsic and extrinsic methods. In the first step, intrinsic sequence information (nucleotide composition and presence of RBSs) served to identify new short putative CDSs (spCDSs) and to eliminate the differences between annotation policies. In the second step, pseudogene fragments and false predictions were filtered out. The last step consisted of screening the remaining spCDSs for lines of extrinsic evidence involving sequence and genecontext comparisons. A total of 789 spCDSs across 20 complete genomes (19 Streptococcus and one Enterococcus) received the support of at least one line of extrinsic evidence, which corresponds to an average of 20 short CDSs per million base pairs. Most of these had no known function, and a significant fraction (31 %) are not even annotated as hypothetical genes in GenBank records. As an illustration of the value of this list, we describe a new family of CDSs, encoding very short hydrophobic peptides (20-23 aa) situated just upstream of some of the positive transcriptional regulators of the Rgg family. The expression of seven other short CDSs from Streptococcus thermophilus CNRZ1066 that encode peptides ranging in length from 41 to 56 aa was confirmed by real-time quantitative RT-PCR and revealed a variety of expression patterns. Finally, one peptide from this list, encoded by a gene that is not annotated in GenBank, was identified in a cell-envelope-enriched fraction of S. thermophilus CNRZ1066.
引用
收藏
页码:3631 / 3644
页数:14
相关论文
共 55 条
[1]   Genome sequence of Streptococcus mutans UA159, a cariogenic dental pathogen [J].
Ajdic, D ;
McShan, WM ;
McLaughlin, RE ;
Savic, G ;
Chang, J ;
Carson, MB ;
Primeaux, C ;
Tian, RY ;
Kenton, S ;
Jia, HG ;
Lin, SP ;
Qian, YD ;
Li, SL ;
Zhu, H ;
Najar, F ;
Lai, HS ;
White, J ;
Roe, BA ;
Ferretti, JJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) :14434-14439
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], 1989, Cladistics, DOI DOI 10.1111/J.1096-0031.1989.TB00562.X
[4]   Progress toward characterization of the group a Streptococcus metagenome:: Complete genome sequence of a macrolide-resistant serotype M6 strain [J].
Banks, DJ ;
Porcella, SF ;
Barbian, KD ;
Beres, SB ;
Philips, LE ;
Voyich, JM ;
DeLeo, FR ;
Martin, JM ;
Somerville, GA ;
Musser, JM .
JOURNAL OF INFECTIOUS DISEASES, 2004, 190 (04) :727-738
[5]   Molecular genetic anatomy of inter- and intraserotype variation in the human bacterial pathogen group A Streptococcus [J].
Beres, SB ;
Richter, EW ;
Nagiec, MJ ;
Sumby, P ;
Porcella, SF ;
Deleo, FR ;
Musser, JM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (18) :7059-7064
[6]   Genome sequence of a serotype M3 strain of group A Streptococcus:: Phage-encoded toxins, the high-virulence phenotype, and clone emergence [J].
Beres, SB ;
Sylva, GL ;
Barbian, KD ;
Lei, BF ;
Hoff, JS ;
Mammarella, ND ;
Liu, MY ;
Smoot, JC ;
Porcella, SF ;
Parkins, LD ;
Campbell, DS ;
Smith, TM ;
McCormick, JK ;
Leung, DYM ;
Schlievert, PM ;
Musser, JM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (15) :10078-10083
[7]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[8]   Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus [J].
Bolotin, A ;
Quinquis, B ;
Renault, P ;
Sorokin, A ;
Ehrlich, SD ;
Kulakauskas, S ;
Lapidus, A ;
Goltsman, E ;
Mazur, M ;
Pusch, GD ;
Fonstein, M ;
Overbeek, R ;
Kyprides, N ;
Purnelle, B ;
Prozzi, D ;
Ngui, K ;
Masuy, D ;
Hancy, F ;
Burteau, S ;
Boutry, M ;
Delcour, J ;
Goffeau, A ;
Hols, P .
NATURE BIOTECHNOLOGY, 2004, 22 (12) :1554-1558
[9]   INTRINSIC AND EXTRINSIC APPROACHES FOR DETECTING GENES IN A BACTERIAL GENOME [J].
BORODOVSKY, M ;
RUDD, KE ;
KOONIN, EV .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4756-4767
[10]   AGMIAL:: implementing an annotation strategy for prokaryote genomes as a distributed system [J].
Bryson, K. ;
Loux, V. ;
Bossy, R. ;
Nicolas, P. ;
Chaillou, S. ;
van de Guchte, M. ;
Penaud, S. ;
Maguin, E. ;
Hoebeke, M. ;
Bessieres, P. ;
Gibrat, J-F .
NUCLEIC ACIDS RESEARCH, 2006, 34 (12) :3533-3545