CMfinder - a covariance model based RNA motif finding algorithm

被引:246
作者
Yao, ZZ [1 ]
Weinberg, Z
Ruzzo, WL
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
关键词
D O I
10.1093/bioinformatics/btk008
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The recent discoveries of large numbers of non-coding RNAs and computational advances in genome-scale RNA search create a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fall short of this goal. Results: CMfinder is a new tool to predict RNA motifs in unaligned sequences. It is an expectation maximization algorithm using covariance models for motif description, featuring novel integration of multiple techniques for effective search of motif space, and a Bayesian framework that blends mutual information-based and folding energy-based approaches to predict structure in a principled way. Extensive tests show that our method works well on datasets with either low or high sequence similarity, is robust to inclusion of lengthy extraneous flanking sequence and/or completely unrelated sequences, and is reasonably fast and scalable. In testing on 19 known ncRNA families, including some difficult cases with poor sequence conservation and large indels, our method demonstrates excellent average per-base-pair accuracy-79% compared with at most 60% for alternative methods. More importantly, the resulting probabilistic model can be directly used for homology search, allowing iterative refinement of structural models based on additional homologs. We have used this approach to obtain highly accurate covariance models of known RNA motifs based on small numbers of related sequences, which identified homologs in deeply-diverged species.
引用
收藏
页码:445 / 452
页数:8
相关论文
共 36 条
[1]   Phylogenetically enhanced statistical tools for RNA structure prediction [J].
Akmaev, VR ;
Kelley, ST ;
Stormo, GD .
BIOINFORMATICS, 2000, 16 (06) :501-512
[2]  
BAFNA V, 2005, P RES COMP MOL BIOL, P1
[3]  
BAILEY TL, 1995, P 3 INT C INT SYST M, P21
[4]   New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control [J].
Barrick, JE ;
Corbino, KA ;
Winkler, WC ;
Nahvi, A ;
Mandal, M ;
Collins, J ;
Lee, M ;
Roth, A ;
Sudarsan, N ;
Jona, I ;
Wickiser, JK ;
Breaker, RR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (17) :6421-6426
[5]   FootPrinter: a program designed for phylogenetic footprinting [J].
Blanchette, M ;
Tompa, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3840-3842
[6]   The 3′ untranslated region of messenger RNA:: A molecular 'hotspot' for pathology? [J].
Conne, B ;
Stutz, A ;
Vassalli, JD .
NATURE MEDICINE, 2000, 6 (06) :637-641
[7]   MSARI: Multiple sequence alignments for statistical detection of RNA secondary structure [J].
Coventry, A ;
Kleitman, DJ ;
Berger, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (33) :12102-12107
[8]   Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction [J].
Dowell, RD ;
Eddy, SR .
BMC BIOINFORMATICS, 2004, 5 (1)
[9]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[10]   A comprehensive comparison of comparative RNA structure prediction approaches [J].
Gardner, PP ;
Giegerich, R .
BMC BIOINFORMATICS, 2004, 5 (1)