SHRiMP: Accurate Mapping of Short Color-space Reads

被引:383
作者
Rumble, Stephen M. [1 ,2 ]
Lacroute, Phil [3 ,4 ]
Dalca, Adrian V. [1 ]
Fiume, Marc [1 ]
Sidow, Arend [3 ,4 ]
Brudno, Michael [1 ,5 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Pathol, Stanford, CA 94305 USA
[5] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON, Canada
基金
加拿大创新基金会; 加拿大自然科学与工程研究理事会;
关键词
SPEED-UP; SEQUENCE; GENOME;
D O I
10.1371/journal.pcbi.1000386
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25-70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP-the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.
引用
收藏
页数:11
相关论文
共 20 条
[1]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[2]   Finding motifs using random projections [J].
Buhler, J ;
Tompa, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) :225-242
[3]  
CALIFANO A, 1993, COMP VIS PATT REC 19, P353
[4]   Striped Smith-Waterman speeds database searches six times over other SIMD implementations [J].
Farrar, Michael .
BIOINFORMATICS, 2007, 23 (02) :156-161
[5]   METHODS FOR ASSESSING THE STATISTICAL SIGNIFICANCE OF MOLECULAR SEQUENCE FEATURES BY USING GENERAL SCORING SCHEMES [J].
KARLIN, S ;
ALTSCHUL, SF .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (06) :2264-2268
[6]   The first human acute myeloid leukemia genome ever fully sequenced [J].
Falini, Brunangelo .
HAEMATOLOGICA, 2024, 109 (01) :1-2
[7]  
Li Ming, 2004, J Bioinform Comput Biol, V2, P417, DOI 10.1142/S0219720004000661
[8]  
LI R, 2008, BIOINFORMATICS
[9]   ZOOM! Zillions of oligos mapped [J].
Lin, Hao ;
Zhang, Zefeng ;
Zhang, Michael Q. ;
Ma, Bin ;
Li, Ming .
BIOINFORMATICS, 2008, 24 (21) :2431-2437
[10]   PatternHunter: faster and more sensitive homology search [J].
Ma, B ;
Tromp, J ;
Li, M .
BIOINFORMATICS, 2002, 18 (03) :440-445