De novo computational prediction of non-coding RNA genes in prokaryotic genomes

被引:33
作者
Tran, Thao T. [1 ,4 ]
Zhou, Fengfeng [1 ]
Marshburn, Sarah [2 ]
Stead, Mark [2 ]
Kushner, Sidney R. [2 ]
Xu, Ying [1 ,3 ]
机构
[1] Univ Georgia, Dept Biochem & Mol Biol, Computat Syst Biol Lab, Inst Bioinformat & BioEnergy Sci Ctr BESC, Athens, GA 30602 USA
[2] Univ Georgia, Dept Genet, Athens, GA 30602 USA
[3] Jilin Univ, Coll Comp Sci & Technol, Changchun 130023, Peoples R China
[4] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
SECONDARY STRUCTURE PREDICTION; ESCHERICHIA-COLI; MESSENGER-RNAS; SEQUENCE ALIGNMENTS; IDENTIFICATION; DINUCLEOTIDE; ALGORITHM; MICROARRAYS; ENSEMBLE; BACTERIA;
D O I
10.1093/bioinformatics/btp537
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The computational identification of non-coding RNA (ncRNA) genes represents one of the most important and challenging problems in computational biology. Existing methods for ncRNA gene prediction rely mostly on homology information, thus limiting their applications to ncRNA genes with known homologues. Results: We present a novel de novo prediction algorithm for ncRNA genes using features derived from the sequences and structures of known ncRNA genes in comparison to decoys. Using these features, we have trained a neural network-based classifier and have applied it to Escherichia coli and Sulfolobus solfataricus for genome-wide prediction of ncRNAs. Our method has an average prediction sensitivity and specificity of 68% and 70%, respectively, for identifying windows with potential for ncRNA genes in E. coli. By combining windows of different sizes and using positional filtering strategies, we predicted 601 candidate ncRNAs and recovered 41% of known ncRNAs in E. coli. We experimentally investigated six novel candidates using Northern blot analysis and found expression of three candidates: one represents a potential new ncRNA, one is associated with stable mRNA decay intermediates and one is a case of either a potential riboswitch or transcription attenuator involved in the regulation of cell division. In general, our approach enables the identification of both cis- and trans-acting ncRNAs in partially or completely sequenced microbial genomes without requiring homology or structural conservation.
引用
收藏
页码:2897 / 2905
页数:9
相关论文
共 52 条
[1]  
ALTSCHUL SF, 1985, MOL BIOL EVOL, V2, P526
[2]   Novel small RNA-encoding genes in the intergenic regions of Escherichia coli [J].
Argaman, L ;
Hershberg, R ;
Vogel, J ;
Bejerano, G ;
Wagner, EGH ;
Margalit, H ;
Altuvia, S .
CURRENT BIOLOGY, 2001, 11 (12) :941-950
[3]   Global analysis of Escherichia coli RNA degradosome function using DNA microarrays [J].
Bernstein, JA ;
Lin, PH ;
Cohen, SN ;
Lin-Chao, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (09) :2758-2763
[4]   A computational approach to identify genes for functional RNAs in genomic sequences [J].
Carter, RJ ;
Dubchak, I ;
Holbrook, SR .
NUCLEIC ACIDS RESEARCH, 2001, 29 (19) :3928-3938
[5]   Boltzmann ensemble features of RNA secondary structures: a comparative analysis of biological RNA sequences and random shuffles [J].
Chan, Chi Yu ;
Ding, Ye .
JOURNAL OF MATHEMATICAL BIOLOGY, 2008, 56 (1-2) :93-105
[6]   A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome [J].
Chen, S ;
Lesnik, EA ;
Hall, TA ;
Sampath, R ;
Griffey, RH ;
Ecker, DJ ;
Blyn, LB .
BIOSYSTEMS, 2002, 65 (2-3) :157-177
[7]   Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency [J].
Clote, P ;
Ferré, F ;
Kranakis, E ;
Krizanc, D .
RNA, 2005, 11 (05) :578-591
[8]   MSARI: Multiple sequence alignments for statistical detection of RNA secondary structure [J].
Coventry, A ;
Kleitman, DJ ;
Berger, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (33) :12102-12107
[9]   ddbRNA: detection of conserved secondary structures in multiple alignments [J].
di Bernardo, D ;
Down, T ;
Hubbard, T .
BIOINFORMATICS, 2003, 19 (13) :1606-1611
[10]   RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble [J].
Ding, Y ;
Chan, CY ;
Lawrence, CE .
RNA, 2005, 11 (08) :1157-1166