Using sequence compression to speedup probabilistic profile matching

被引:13
作者
Freschi, V [1 ]
Bogliolo, A [1 ]
机构
[1] Univ Urbino, Informat Sci & Technol Inst, I-61029 Urbino, Italy
关键词
D O I
10.1093/bioinformatics/bti323
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P < N. Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.
引用
收藏
页码:2225 / 2229
页数:5
相关论文
共 15 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Let sleeping files lie: Pattern matching in Z-compressed files
    Amir, A
    Benson, G
    Farach, M
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1996, 52 (02) : 299 - 307
  • [3] [Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
  • [4] Attwood T K, 2000, Brief Bioinform, V1, P45, DOI 10.1093/bib/1.1.45
  • [5] BECKSTETTE M, 2004, P GERM C BIOINF 2004
  • [6] A subquadratic sequence alignment algorithm for unrestricted scoring matrices
    Crochemore, M
    Landau, GM
    Ziv-Ukelson, M
    [J]. SIAM JOURNAL ON COMPUTING, 2003, 32 (06) : 1654 - 1673
  • [7] Dorohonceanu B, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P128
  • [8] Gonnet G. H., 2004, Journal of Discrete Algorithms, V2, P3, DOI 10.1016/S1570-8667(03)00062-5
  • [9] Kärkkäinen J, 2000, LECT NOTES COMPUT SC, V1848, P195
  • [10] The efficient computation of position-specific match scores with the fast Fourier transform
    Rajasekaran, S
    Jin, X
    Spouge, JL
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (01) : 23 - 33