Sequence-based heuristics for faster annotation of non-coding RNA families

被引:60
作者
Weinberg, Z [1 ]
Ruzzo, WL
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
关键词
D O I
10.1093/bioinformatics/bti743
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. Results: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that-unlike family-specific solutions-can scale to hundreds of ncRNA families.
引用
收藏
页码:35 / 39
页数:5
相关论文
共 16 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[3]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[4]   Computational Genomics of noncoding RNA genes [J].
Eddy, SR .
CELL, 2002, 109 (02) :137-140
[5]   Rfam: annotating non-coding RNAs in complete genomes [J].
Griffiths-Jones, S ;
Moxon, S ;
Marshall, M ;
Khanna, A ;
Eddy, SR ;
Bateman, A .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D121-D124
[6]  
Kececioglu J., 2004, P 8 ANN INT C RES CO, DOI [10.1145/974614.974626, DOI 10.1145/974614.974626]
[7]   Breakthrough of the year [J].
Kennedy, D .
SCIENCE, 2002, 298 (5602) :2283-2283
[8]   RSEARCH: Finding homologs of single structured RNA sequences [J].
Klein, RJ ;
Eddy, SR .
BMC BIOINFORMATICS, 2003, 4 (1)
[9]   Characterization of mammalian selenoproteomes [J].
Kryukov, GV ;
Castellano, S ;
Novoselov, SV ;
Lobanov, AV ;
Zehtab, O ;
Guigó, R ;
Gladyshev, VN .
SCIENCE, 2003, 300 (5624) :1439-1443
[10]   tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence [J].
Lowe, TM ;
Eddy, SR .
NUCLEIC ACIDS RESEARCH, 1997, 25 (05) :955-964