Candidates for novel RNA topologies

被引:44
作者
Kim, N
Shiffeldrim, N
Gan, HH
Schlick, T
机构
[1] NYU, Dept Chem, New York, NY 10003 USA
[2] NYU, Courant Inst Math Sci, New York, NY 10012 USA
关键词
RNA secondary structure; novel RNA; pseudoknot; graph theory; clustering algorithm;
D O I
10.1016/j.jmb.2004.06.054
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Because the functional repertiore of RNA molecules, like proteins, is closely linked to the diversity of their shapes, uncovering RNA's structural repertoire is vital for identifying novel RNA;3, especially in genomic sequences. To help expand the limited number of known RNA families, we use graphical representation and clustering analysis of RNA secondary structures to predict novel RNA topologies and their abundance as a function of size. Representing the essential topological properties of RNA, secondary structures as graphs enables enumeration, generation, and prediction of novel RNA motifs. We apply a probabilistic graph-growing, method to construct the RNA structure space encompassing the topologies of existing and hypothetical RNAs and cluster all RNA topologies into two groups using topological descriptors and a standard clustering algorithm. Significantly, we find that nearly all existing RNAs fall into one group, which we refer to as "RNA-like"; we consider the other group "non-RNAlike". Our method predicts many candidates for novel RNA secondary topologies, some of which are remarkably similar to existing structures; interestingly, the centroid of the RNA-like group is the tmRNA fold, a pseudoknot having both tRNA-like and mRNA.-like functions. Additionally, our approach allows estimation of the relative abundance of pseudoknot and other (e.g. tree) motifs using the "edge-cut" property of RNA graphs. This analysis suggests that pseudoknots dominate the RNA structure universe, representing more than 90% when the sequence length exceeds 120 nt; the predicted trend for < 100 nt agrees with data for existing RNAs. Together with our predictions for novel "RNA-like" topologies, our analysis can help direct the design of functional RNAs and identification of novel RNA folds in genomes through an efficient topology-directed search, which grows much more slowly in complexity with RNA size compared to the traditional sequence-based search. 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1129 / 1144
页数:16
相关论文
共 55 条
[1]  
Alavi Y., 1991, Graph theory, combinatorics, and applications, V2, P871
[2]   A new algorithm for RNA secondary structure design [J].
Andronescu, M ;
Fejes, AP ;
Hutter, F ;
Hoos, HH ;
Condon, A .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 336 (03) :607-624
[3]  
Atkins J. F., 1999, RNA WORLD
[4]   A graph-topological approach to recognition of pattern and similarity in RNA secondary structures [J].
Benedetti, G ;
Morosetti, S .
BIOPHYSICAL CHEMISTRY, 1996, 59 (1-2) :179-184
[5]   THE NUCLEIC-ACID DATABASE - A COMPREHENSIVE RELATIONAL DATABASE OF 3-DIMENSIONAL STRUCTURES OF NUCLEIC-ACIDS [J].
BERMAN, HM ;
OLSON, WK ;
BEVERIDGE, DL ;
WESTBROOK, J ;
GELBIN, A ;
DEMENY, T ;
HSIEH, SH ;
SRINIVASAN, AR ;
SCHNEIDER, B .
BIOPHYSICAL JOURNAL, 1992, 63 (03) :751-759
[6]   Engineered allosteric ribozymes as biosensor components [J].
Breaker, RR .
CURRENT OPINION IN BIOTECHNOLOGY, 2002, 13 (01) :31-39
[7]   Generating diverse skeletons of small molecules combinatorially [J].
Burke, MD ;
Berger, EM ;
Schreiber, SL .
SCIENCE, 2003, 302 (5645) :613-618
[8]   Structural genomics: beyond the Human Genome Project [J].
Burley, SK ;
Almo, SC ;
Bonanno, JB ;
Capel, M ;
Chance, MR ;
Gaasterland, T ;
Lin, DW ;
Sali, A ;
Studier, FW ;
Swaminathan, S .
NATURE GENETICS, 1999, 23 (02) :151-157
[9]   A computational approach to identify genes for functional RNAs in genomic sequences [J].
Carter, RJ ;
Dubchak, I ;
Holbrook, SR .
NUCLEIC ACIDS RESEARCH, 2001, 29 (19) :3928-3938
[10]   Structural genomics: A pipeline for providing structures for the biologist [J].
Chance, MR ;
Bresnick, AR ;
Burley, SK ;
Jiang, JS ;
Lima, CD ;
Sali, A ;
Almo, SC ;
Bonanno, JB ;
Buglino, JA ;
Boulton, S ;
Chen, H ;
Eswar, N ;
He, GS ;
Huang, R ;
Ilyin, V ;
McMahan, L ;
Pieper, U ;
Ray, S ;
Vidal, M ;
Wang, LK .
PROTEIN SCIENCE, 2002, 11 (04) :723-738