The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote

被引:2003
作者
Liao, Yang [1 ,2 ]
Smyth, Gordon K. [1 ,3 ]
Shi, Wei [1 ,2 ]
机构
[1] Walter & Eliza Hall Inst Med Res, Div Bioinformat, Parkville, Vic 3052, Australia
[2] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic 3010, Australia
[3] Univ Melbourne, Dept Math & Stat, Parkville, Vic 3010, Australia
基金
澳大利亚国家健康与医学研究理事会; 英国医学研究理事会;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; ALIGNMENT; SEQUENCE; GENOME; SEARCH; NORMALIZATION; ALGORITHM;
D O I
10.1093/nar/gkt214
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.
引用
收藏
页数:17
相关论文
共 51 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[4]   The external RNA controls consortium: a progress report [J].
Baker, SC ;
Bauer, SR ;
Beyer, RP ;
Brenton, JD ;
Bromley, B ;
Burrill, J ;
Causton, H ;
Conley, MP ;
Elespuru, R ;
Fero, M ;
Foy, C ;
Fuscoe, J ;
Gao, XL ;
Gerhold, DL ;
Gilles, P ;
Goodsaid, F ;
Guo, X ;
Hackett, J ;
Hockett, RD ;
Ikonomi, P ;
Irizarry, RA ;
Kawasaki, ES ;
Kaysser-Kranich, T ;
Kerr, K ;
Kiser, G ;
Koch, WH ;
Lee, KY ;
Liu, CM ;
Liu, ZL ;
Lucas, A ;
Manohar, CF ;
Miyada, G ;
Modrusan, Z ;
Parkes, H ;
Puri, RK ;
Reid, L ;
Ryder, TB ;
Salit, M ;
Samaha, RR ;
Scherf, U ;
Sendera, TJ ;
Setterquist, RA ;
Shi, LM ;
Shippy, R ;
Soriano, JV ;
Wagar, EA ;
Warrington, JA ;
Williams, M ;
Wilmer, F ;
Wilson, M .
NATURE METHODS, 2005, 2 (10) :731-734
[5]   A 14 bp indel variation in the NCX1 gene modulates the age at onset in late-onset Alzheimer's disease [J].
Bi, Xiu-Hua ;
Lu, Cui-Min ;
Liu, Qian ;
Zhang, Zhen-Xin ;
Zhao, Hua-Lu ;
Yu, Jia ;
Zhang, Jun-Wu .
JOURNAL OF NEURAL TRANSMISSION, 2012, 119 (03) :383-386
[6]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[7]  
Burkhardt S., 1999, INT C COMPUTATIONAL, P77
[8]   SHRiMP2: Sensitive yet Practical Short Read Mapping [J].
David, Matei ;
Dzamba, Misko ;
Lister, Dan ;
Ilie, Lucian ;
Brudno, Michael .
BIOINFORMATICS, 2011, 27 (07) :1011-1012
[9]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[10]   Spike-in validation of an Illumina-specific variance-stabilizing transformation [J].
Dunning M.J. ;
Ritchie M.E. ;
Barbosa-Morais N.L. ;
Tavaré S. ;
Lynch A.G. .
BMC Research Notes, 1 (1)