Remote homology detection: a motif based approach

被引:109
作者
Ben-Hur, Asa [1 ]
Brutlag, Douglas [1 ]
机构
[1] Stanford Univ, Dept Biochem, Beckman Ctr B400, Stanford, CA 94305 USA
关键词
remote homology; discrete sequence motifs; sequence similarity; Support Vector Machines; kernel methods;
D O I
10.1093/bioinformatics/btg1002
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard computational problem with no approach that works well in all cases. Results: We present a method for detecting remote homology that is based on the presence of discrete sequence motifs. The motif content of a pair of sequences is used to define a similarity that is used as a kernel for a Support Vector Machine (SVM) classifier. We test the method on two remote homology detection tasks: prediction of a previously unseen SCOP family and prediction of an enzyme class given other enzymes that have a similar function on other substrates. We find that it performs significantly better than an SVM method that uses BLAST or Smith-Waterman similarity scores as features.
引用
收藏
页码:i26 / i33
页数:8
相关论文
共 22 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 2002, LIBSVM LIB SUPPORT V
  • [3] [Anonymous], 2002, Proc. of the Intl. Conf. on Research in Computational Molecular Biology
  • [4] Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
  • [5] The ASTRAL compendium for protein structure and sequence analysis
    Brenner, SE
    Koehl, P
    Levitt, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 254 - 256
  • [6] Cristianini N, 2000, Intelligent Data Analysis: An Introduction
  • [7] Egan J.P., 1975, SERIES COGNITION PER
  • [8] The PROSITE database, its status in 2002
    Falquet, L
    Pagni, M
    Bucher, P
    Hulo, N
    Sigrist, CJA
    Hofmann, K
    Bairoch, A
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 235 - 238
  • [9] Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations
    Henikoff, S
    Henikoff, JG
    Pietrokovski, S
    [J]. BIOINFORMATICS, 1999, 15 (06) : 471 - 479
  • [10] The EMOTIF database
    Huang, JY
    Brutlag, DL
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 202 - 204