Specific alignment of structured RNA: stochastic grammars and sequence annealing
被引:23
作者:
Bradley, Robert K.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Biophys Grad Grp, Berkeley, CA 94720 USAUniv Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
Bradley, Robert K.
[2
]
Pachter, Lior
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USAUniv Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
Pachter, Lior
[1
]
Holmes, Ian
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Biophys Grad Grp, Berkeley, CA 94720 USA
Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USAUniv Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
Holmes, Ian
[2
,3
]
机构:
[1] Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Biophys Grad Grp, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USA
Motivation: Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences. Results: When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages.