A computational approach to identify genes for functional RNAs in genomic sequences

被引:142
作者
Carter, RJ
Dubchak, I
Holbrook, SR
机构
[1] Lawrence Berkeley Natl Lab, Phys Biosci Div, Computat & Theoret Biol Dept, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Natl Energy Res Sci Comp Ctr, Berkeley, CA 94720 USA
关键词
D O I
10.1093/nar/29.19.3928
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing experiments for bacteria and 90-99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.
引用
收藏
页码:3928 / 3938
页数:11
相关论文
共 45 条
[1]
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]
Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[3]
DNA MISMATCH CORRECTION BY VERY SHORT PATCH REPAIR MAY HAVE ALTERED THE ABUNDANCE OF OLIGONUCLEOTIDES IN THE ESCHERICHIA-COLI GENOME [J].
BHAGWAT, AS ;
MCCLELLAND, M .
NUCLEIC ACIDS RESEARCH, 1992, 20 (07) :1663-1668
[4]
The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[5]
A NOVEL RNA PRODUCT OF THE TYRT OPERON OF ESCHERICHIA-COLI [J].
BOSL, M ;
KERSTEN, H .
NUCLEIC ACIDS RESEARCH, 1991, 19 (21) :5863-5870
[6]
The distribution of RNA motifs in natural sequences [J].
Bourdeau, V ;
Ferbeyre, G ;
Pageau, M ;
Paquin, B ;
Cedergren, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (22) :4457-4467
[7]
OVER-REPRESENTATION AND UNDER-REPRESENTATION OF SHORT OLIGONUCLEOTIDES IN DNA-SEQUENCES [J].
BURGE, C ;
CAMPBELL, AM ;
KARLIN, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (04) :1358-1362
[8]
Chen XG, 2000, GENE DEV, V14, P777
[9]
Rules for RNA recognition of GNRA tetraloops deduced by in vitro selection: Comparison with in vivo evolution [J].
Costa, M ;
Michel, F .
EMBO JOURNAL, 1997, 16 (11) :3289-3302
[10]
Cristianini N, 2000, Intelligent Data Analysis: An Introduction