Efficient remote homology detection using local structure

被引:53
作者
Hou, Y [1 ]
Hsu, W
Lee, ML
Bystroff, C
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
[2] Rensselaer Polytech Inst, Dept Biol, Troy, NY 12180 USA
关键词
D O I
10.1093/bioinformatics/btg317
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The function of an unknown biological sequence can often be accurately inferred if we are able to map this unknown sequence to its corresponding homologous family. At present, discriminative methods such as SVM-Fisher and SVM-pairwise, which combine support vector machine (SVM) and sequence similarity, are recognized as the most accurate methods, with SVM-pairwise being the most accurate. However, these methods typically encode sequence information into their feature vectors and ignore the structure information. They are also computationally inefficient. Based on these observations, we present an alternative method for SVM-based protein classification. Our proposed method, SVM-I-sites, utilizes structure similarity for remote homology detection. Result: We run experiments on the Structural Classification of Proteins 1.53 data set. The results show that SVM-I-sites is more efficient than SVM-pairwise. Further, we find that SVM-I-sites outperforms sequence-based methods such as PSI-BLAST, SAM, and SVM-Fisher while achieving a comparable performance with SVM-pairwise.
引用
收藏
页码:2294 / 2301
页数:8
相关论文
共 27 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]  
[Anonymous], 1990, T ASAE
[4]  
[Anonymous], 1990, METHOD ENZYMOL
[5]   HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].
BALDI, P ;
CHAUVIN, Y ;
HUNKAPILLER, T ;
MCCLURE, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063
[6]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[7]   Prediction of local structure in proteins using a library of sequence-structure motifs [J].
Bystroff, C ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) :565-577
[8]   STANDARD STRUCTURES IN PROTEINS [J].
EFIMOV, AV .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 1993, 60 (03) :201-239
[9]   Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching [J].
Gribskov, M ;
Robinson, NL .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :25-33
[10]   PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS [J].
GRIBSKOV, M ;
MCLACHLAN, AD ;
EISENBERG, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) :4355-4358