SHRiMP: Accurate Mapping of Short Color-space Reads

被引:383
作者
Rumble, Stephen M. [1 ,2 ]
Lacroute, Phil [3 ,4 ]
Dalca, Adrian V. [1 ]
Fiume, Marc [1 ]
Sidow, Arend [3 ,4 ]
Brudno, Michael [1 ,5 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Pathol, Stanford, CA 94305 USA
[5] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON, Canada
基金
加拿大创新基金会; 加拿大自然科学与工程研究理事会;
关键词
SPEED-UP; SEQUENCE; GENOME;
D O I
10.1371/journal.pcbi.1000386
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25-70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP-the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.
引用
收藏
页数:11
相关论文
共 20 条
[11]  
ONDOV B, 2008, BIOINFORMATICS
[12]   Efficient q-gram filters for finding all ε-matches over a given length [J].
Rasmussen, KR ;
Stoye, J ;
Myers, EW .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (02) :296-308
[13]   Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors [J].
Rognes, T ;
Seeberg, E .
BIOINFORMATICS, 2000, 16 (08) :699-706
[14]   A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome [J].
Small, Kerrin S. ;
Brudno, Michael ;
Hill, Matthew M. ;
Sidow, Arend .
GENOME BIOLOGY, 2007, 8 (03)
[15]   Extreme genomic variation in a natural population [J].
Small, Kerrin S. ;
Brudno, Michael ;
Hill, Matthew M. ;
Sidow, Arend .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (13) :5698-5703
[16]   IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES [J].
SMITH, TF ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) :195-197
[17]   The diploid genome sequence of an Asian individual [J].
Wang, Jun ;
Wang, Wei ;
Li, Ruiqiang ;
Li, Yingrui ;
Tian, Geng ;
Goodman, Laurie ;
Fan, Wei ;
Zhang, Junqing ;
Li, Jun ;
Zhang, Juanbin ;
Guo, Yiran ;
Feng, Binxiao ;
Li, Heng ;
Lu, Yao ;
Fang, Xiaodong ;
Liang, Huiqing ;
Du, Zhenglin ;
Li, Dong ;
Zhao, Yiqing ;
Hu, Yujie ;
Yang, Zhenzhen ;
Zheng, Hancheng ;
Hellmann, Ines ;
Inouye, Michael ;
Pool, John ;
Yi, Xin ;
Zhao, Jing ;
Duan, Jinjie ;
Zhou, Yan ;
Qin, Junjie ;
Ma, Lijia ;
Li, Guoqing ;
Yang, Zhentao ;
Zhang, Guojie ;
Yang, Bin ;
Yu, Chang ;
Liang, Fang ;
Li, Wenjie ;
Li, Shaochuan ;
Li, Dawei ;
Ni, Peixiang ;
Ruan, Jue ;
Li, Qibin ;
Zhu, Hongmei ;
Liu, Dongyuan ;
Lu, Zhike ;
Li, Ning ;
Guo, Guangwu ;
Zhang, Jianguo ;
Ye, Jia .
NATURE, 2008, 456 (7218) :60-U1
[18]  
Wozniak A, 1997, COMPUT APPL BIOSCI, V13, P145
[19]  
Yanovsky V, 2008, LECT N BIOINFORMAT, V5251, P38, DOI 10.1007/978-3-540-87361-7_4
[20]  
MAPREADS