STAR: ultrafast universal RNA-seq aligner

被引:33081
作者
Dobin, Alexander [1 ]
Davis, Carrie A. [1 ]
Schlesinger, Felix [1 ]
Drenkow, Jorg [1 ]
Zaleski, Chris [1 ]
Jha, Sonali [1 ]
Batut, Philippe [1 ]
Chaisson, Mark [2 ]
Gingeras, Thomas R. [1 ]
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[2] Pacific Biosci, Menlo Pk, CA USA
关键词
SPLICE JUNCTIONS; ALIGNMENT; ALGORITHMS; SEQUENCE; ENCODE;
D O I
10.1093/bioinformatics/bts635
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results: To align our large (> 80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of > 50 in mapping speed, aligning to the human genome 550 million 2 x 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.
引用
收藏
页码:15 / 21
页数:7
相关论文
共 22 条
[1]
Detection of splice junctions from paired-end RNA-seq data by SpliceMap [J].
Au, Kin Fai ;
Jiang, Hui ;
Lin, Lan ;
Xing, Yi ;
Wong, Wing Hung .
NUCLEIC ACIDS RESEARCH, 2010, 38 (14) :4570-4578
[2]
progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement [J].
Darling, Aaron E. ;
Mau, Bob ;
Perna, Nicole T. .
PLOS ONE, 2010, 5 (06)
[3]
Mauve: Multiple alignment of conserved genomic sequence with rearrangements [J].
Darling, ACE ;
Mau, B ;
Blattner, FR ;
Perna, NT .
GENOME RESEARCH, 2004, 14 (07) :1394-1403
[4]
Optimal spliced alignments of short sequence reads [J].
De Bona, Fabio ;
Ossowski, Stephan ;
Schneeberger, Korbinian ;
Raetsch, Gunnar .
BIOINFORMATICS, 2008, 24 (16) :I174-I180
[5]
Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[6]
Fast algorithms for large-scale genome alignment and comparison [J].
Delcher, AL ;
Phillippy, A ;
Carlton, J ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2002, 30 (11) :2478-2483
[7]
Landscape of transcription in human cells [J].
Djebali, Sarah ;
Davis, Carrie A. ;
Merkel, Angelika ;
Dobin, Alex ;
Lassmann, Timo ;
Mortazavi, Ali ;
Tanzer, Andrea ;
Lagarde, Julien ;
Lin, Wei ;
Schlesinger, Felix ;
Xue, Chenghai ;
Marinov, Georgi K. ;
Khatun, Jainab ;
Williams, Brian A. ;
Zaleski, Chris ;
Rozowsky, Joel ;
Roeder, Maik ;
Kokocinski, Felix ;
Abdelhamid, Rehab F. ;
Alioto, Tyler ;
Antoshechkin, Igor ;
Baer, Michael T. ;
Bar, Nadav S. ;
Batut, Philippe ;
Bell, Kimberly ;
Bell, Ian ;
Chakrabortty, Sudipto ;
Chen, Xian ;
Chrast, Jacqueline ;
Curado, Joao ;
Derrien, Thomas ;
Drenkow, Jorg ;
Dumais, Erica ;
Dumais, Jacqueline ;
Duttagupta, Radha ;
Falconnet, Emilie ;
Fastuca, Meagan ;
Fejes-Toth, Kata ;
Ferreira, Pedro ;
Foissac, Sylvain ;
Fullwood, Melissa J. ;
Gao, Hui ;
Gonzalez, David ;
Gordon, Assaf ;
Gunawardena, Harsha ;
Howald, Cedric ;
Jha, Sonali ;
Johnson, Rory ;
Kapranov, Philipp ;
King, Brandon .
NATURE, 2012, 489 (7414) :101-108
[8]
Flusberg BA, 2010, NAT METHODS, V7, P461, DOI [10.1038/NMETH.1459, 10.1038/nmeth.1459]
[9]
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) [J].
Grant, Gregory R. ;
Farkas, Michael H. ;
Pizarro, Angel D. ;
Lahens, Nicholas F. ;
Schug, Jonathan ;
Brunk, Brian P. ;
Stoeckert, Christian J. ;
Hogenesch, John B. ;
Pierce, Eric A. .
BIOINFORMATICS, 2011, 27 (18) :2518-2528
[10]
Pre-mRNA splicing: where and when in the nucleus [J].
Han, Joonhee ;
Xiong, Ji ;
Wang, Dong ;
Fu, Xiang-Dong .
TRENDS IN CELL BIOLOGY, 2011, 21 (06) :336-343