Family pairwise search with embedded motif models

被引:12
作者
Grundy, WN [1 ]
Bailey, TL
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[2] SDSC, NPACI, La Jolla, CA 92093 USA
关键词
D O I
10.1093/bioinformatics/15.6.463
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Statistical models of protein families, such as position-specific scoring matrices, profiles and hidden Markov models, have been used effectively to find remote homologs when given a set of known protein family members. Unfortunately training these models typically requires a relatively large set of training sequences. Recent work (Grundy, J. Comput. Biol., 5, 479-492, 1998) has shown that, when only a few family members are known, several theoretically justified statistical modeling techniques fail to provide homology detection performance on a par with Family Pairwise Search (FPS), an algorithm that combines scores from a pairwise sequence similarity algorithm such as BLAST. Results: The present paper provides a model-based algorithm that improves FPS by incorporating hybrid motif-based models of the form generated by Cobbler (Henikoff and Henikoff, Protein Sci., 6, 698-705, 1997). For the 73 protein families investigated here, this cobbled FPS algorithm provides better homology detection performance than either Cobbler or FPS alone. This improvement is maintained when BLAST is replaced with the fill Smith-Waterman algorithm.
引用
收藏
页码:463 / 470
页数:8
相关论文
共 35 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] [Anonymous], 1994, P INT C INT SYST MOL
  • [4] The PRINTS protein fingerprint database in its fifth year
    Attwood, TK
    Beck, ME
    Flower, DR
    Scordis, P
    Selley, JN
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 304 - 308
  • [5] Bailey T L, 1996, Proc Int Conf Intell Syst Mol Biol, V4, P15
  • [6] Score distributions for simultaneous matching to multiple motifs
    Bailey, TL
    Gribskov, M
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (01) : 45 - 59
  • [7] Combining evidence using p-values: application to sequence homology searches
    Bailey, TL
    Gribskov, M
    [J]. BIOINFORMATICS, 1998, 14 (01) : 48 - 54
  • [8] BAILEY TL, 1999, MEME MULTIPLE EM MOT
  • [9] BAILEY TL, 1999, IN PRESS P 3 ANN INT
  • [10] BAIROCH A, 1994, NUCLEIC ACIDS RES, V22, P3578