RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data

被引:291
作者
Zhao, Yongan [1 ]
Tang, Haixu [1 ,2 ]
Ye, Yuzhen [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47404 USA
[2] Indiana Univ, Ctr Genom & Bioinformat, Bloomington, IN 47404 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
BLAST;
D O I
10.1093/bioinformatics/btr595
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a similar to 20-90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search-another 2-3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration ( e. g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.
引用
收藏
页码:125 / 126
页数:2
相关论文
共 10 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
  • [4] Functional metagenomic profiling of nine biomes
    Dinsdale, Elizabeth A.
    Edwards, Robert A.
    Hall, Dana
    Angly, Florent
    Breitbart, Mya
    Brulc, Jennifer M.
    Furlan, Mike
    Desnues, Christelle
    Haynes, Matthew
    Li, Linlin
    McDaniel, Lauren
    Moran, Mary Ann
    Nelson, Karen E.
    Nilsson, Christina
    Olson, Robert
    Paul, John
    Brito, Beltran Rodriguez
    Ruan, Yijun
    Swan, Brandon K.
    Stevens, Rick
    Valentine, David L.
    Thurber, Rebecca Vega
    Wegley, Linda
    White, Bryan A.
    Rohwer, Forest
    [J]. NATURE, 2008, 452 (7187) : 629 - U8
  • [5] MEGAN analysis of metagenomic data
    Huson, Daniel H.
    Auch, Alexander F.
    Qi, Ji
    Schuster, Stephan C.
    [J]. GENOME RESEARCH, 2007, 17 (03) : 377 - 386
  • [6] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
  • [7] RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
    Marioni, John C.
    Mason, Christopher E.
    Mane, Shrikant M.
    Stephens, Matthew
    Gilad, Yoav
    [J]. GENOME RESEARCH, 2008, 18 (09) : 1509 - 1517
  • [8] Metagenomics: Genomic analysis of microbial communities
    Riesenfeld, CS
    Schloss, PD
    Handelsman, J
    [J]. ANNUAL REVIEW OF GENETICS, 2004, 38 : 525 - 552
  • [9] A core gut microbiome in obese and lean twins
    Turnbaugh, Peter J.
    Hamady, Micah
    Yatsunenko, Tanya
    Cantarel, Brandi L.
    Duncan, Alexis
    Ley, Ruth E.
    Sogin, Mitchell L.
    Jones, William J.
    Roe, Bruce A.
    Affourtit, Jason P.
    Egholm, Michael
    Henrissat, Bernard
    Heath, Andrew C.
    Knight, Rob
    Gordon, Jeffrey I.
    [J]. NATURE, 2009, 457 (7228) : 480 - U7
  • [10] RAPSearch: a fast protein similarity search tool for short reads
    Ye, Yuzhen
    Choi, Jeong-Hyeon
    Tang, Haixu
    [J]. BMC BIOINFORMATICS, 2011, 12