HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data

被引:33
作者
Dimon, Michelle T. [1 ,2 ]
Sorber, Katherine [1 ]
DeRisi, Joseph L. [1 ,3 ]
机构
[1] Univ Calif San Francisco, Dept Biochem & Biophys, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Biol & Med Informat Program, San Francisco, CA 94143 USA
[3] Howard Hughes Med Inst, Bethesda, MD 20817 USA
来源
PLOS ONE | 2010年 / 5卷 / 11期
关键词
MESSENGER-RNA; SEQUENCE; TRANSCRIPTOME; ALIGNMENT;
D O I
10.1371/journal.pone.0013875
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. Methodology/Principal Findings: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. Conclusions/Significance: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on pre-built gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6% of 3' splice sites and 1.4% of 5' splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available at http://derisilab.ucsf.edu/software/hmmsplicer.
引用
收藏
页数:16
相关论文
共 42 条
[31]   The SR protein family [J].
Shepard, Peter J. ;
Hertel, Klemens J. .
GENOME BIOLOGY, 2009, 10 (10) :242
[32]   tRNA ligase is required for regulated mRNA splicing in the unfolded protein response [J].
Sidrauski, C ;
Cox, JS ;
Walter, P .
CELL, 1996, 87 (03) :405-413
[33]   The Long March: A Sample Preparation Technique that Enhances Contig Length and Coverage by High-Throughput Short-Read Sequencing [J].
Sorber, Katherine ;
Chiu, Charles ;
Webster, Dale ;
Dimon, Michelle ;
Ruby, J. Graham ;
Hekele, Armin ;
DeRisi, Joseph L. .
PLOS ONE, 2008, 3 (10)
[34]   ASD: a bioinformatics resource on alternative splicing [J].
Stamm, Stefan ;
Riethoven, Jean-Jack ;
Le Texier, Vincent ;
Gopalakrishnan, Chellappa ;
Kumanduri, Vasudev ;
Tang, Yesheng ;
Barbosa-Morais, Nuno L. ;
Thanaraj, Thangavel Alphonse .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D46-D55
[35]   Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation [J].
Trapnell, Cole ;
Williams, Brian A. ;
Pertea, Geo ;
Mortazavi, Ali ;
Kwan, Gordon ;
van Baren, Marijke J. ;
Salzberg, Steven L. ;
Wold, Barbara J. ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2010, 28 (05) :511-U174
[36]   TopHat: discovering splice junctions with RNA-Seq [J].
Trapnell, Cole ;
Pachter, Lior ;
Salzberg, Steven L. .
BIOINFORMATICS, 2009, 25 (09) :1105-1111
[37]   The Spliceosome: Design Principles of a Dynamic RNP Machine [J].
Wahl, Markus C. ;
Will, Cindy L. ;
Luehrmann, Reinhard .
CELL, 2009, 136 (04) :701-718
[38]   Alternative isoform regulation in human tissue transcriptomes [J].
Wang, Eric T. ;
Sandberg, Rickard ;
Luo, Shujun ;
Khrebtukova, Irina ;
Zhang, Lu ;
Mayr, Christine ;
Kingsmore, Stephen F. ;
Schroth, Gary P. ;
Burge, Christopher B. .
NATURE, 2008, 456 (7221) :470-476
[39]   RNA-Seq: a revolutionary tool for transcriptomics [J].
Wang, Zhong ;
Gerstein, Mark ;
Snyder, Michael .
NATURE REVIEWS GENETICS, 2009, 10 (01) :57-63
[40]   Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution [J].
Wilhelm, Brian T. ;
Marguerat, Samuel ;
Watt, Stephen ;
Schubert, Falk ;
Wood, Valerie ;
Goodhead, Ian ;
Penkett, Christopher J. ;
Rogers, Jane ;
Bahler, Jurg .
NATURE, 2008, 453 (7199) :1239-U39