Hidden Markov models for detecting remote protein homologies

被引:798
作者
Karplus, K [1 ]
Barrett, C [1 ]
Hughey, R [1 ]
机构
[1] Univ Calif Santa Cruz, Jack Baskin Sch Engn, Dept Comp Sci, Santa Cruz, CA 95064 USA
关键词
D O I
10.1093/bioinformatics/14.10.846
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A new hidden Markov model method (SAM-T98) for finding remote homologs of protein sequences is described and evaluated. The method begins with a simple target sequence and iteratively builds a hidden Markov model (HMM) from the sequence and homologs found using die HMM for database search. SAM-T98 is also used to construct model libraries automatically, from sequences in structural databases. Methods: We evaluate the SAM-T98 method with foul datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against WU-BLASTP and against DOUBLE-BLAST, a two-step method similar to ISS, but using BLAST instead of FASTA. Results: SAM-T98 had the fewest errors in all tests- dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP (Structural Classification of Proteins)-domains test, SAM-T98 got 880 flue positives and 68 false positives, DOUBLE-BLAST got 533 true positives with 71 false positives, ann WU-BLASTP got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to find family or fold relationships, One key to the performance of the HMM method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model.
引用
收藏
页码:846 / 856
页数:11
相关论文
共 36 条
  • [1] Comparative accuracy of methods for protein sequence similarity search
    Agarwal, P
    States, DJ
    [J]. BIOINFORMATICS, 1998, 14 (01) : 40 - 47
  • [2] AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE
    ALTSCHUL, SF
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) : 555 - 565
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [5] HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION
    BALDI, P
    CHAUVIN, Y
    HUNKAPILLER, T
    MCCLURE, MA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) : 1059 - 1063
  • [6] BARKER WC, 1990, METHOD ENZYMOL, V183, P31
  • [7] Barrett C, 1997, COMPUT APPL BIOSCI, V13, P191
  • [8] PairWise and SearchWise: Finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames
    Birney, E
    Thompson, JD
    Gibson, TJ
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (14) : 2730 - 2739
  • [9] BRENNER SE, 1996, THESIS U CAMBRIDGE C
  • [10] A flexible motif search technique based on generalized profiles
    Bucher, P
    Karplus, K
    Moeri, N
    Hofmann, K
    [J]. COMPUTERS & CHEMISTRY, 1996, 20 (01): : 3 - 23