Discrimination between distant homologs and structural analogs: Lessons from manually constructed, reliable data sets

被引:22
作者
Cheng, Hua [1 ]
Kim, Bong-Hyun
Grishin, Nick V.
机构
[1] Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
关键词
homology; analogy; discrimination; protein structures; support vector machines;
D O I
10.1016/j.jmb.2007.12.076
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1265 / 1278
页数:14
相关论文
共 71 条
[1]  
ALTSCHUL SF, 1997, NUCLEIC ACIDS RES, V25, P3402
[2]   On the origin of the histone fold [J].
Alva, Vikram ;
Ammelburg, Moritz ;
Soeding, Johannes ;
Lupas, Andrei N. .
BMC STRUCTURAL BIOLOGY, 2007, 7
[3]  
[Anonymous], 2006, GENOME BIOL, DOI DOI 10.1186/gb-2006-7-7-r60
[4]   The many faces of the helix-turn-helix domain: Transcription regulation and beyond [J].
Aravind, L ;
Anantharaman, V ;
Balaji, S ;
Babu, MM ;
Iyer, LM .
FEMS MICROBIOLOGY REVIEWS, 2005, 29 (02) :231-262
[5]   The RAGNYA fold: a novel fold with multiple topological variants found in functionally diverse nucleic acid, nucleotide and peptide-binding proteins [J].
Balaji, S. ;
Aravind, L. .
NUCLEIC ACIDS RESEARCH, 2007, 35 (17) :5658-5671
[6]   Structure and ligand binding of carbohydrate-binding module CsCBM6-3 reveals similarities with fucose-specific lectins and "galactose-binding" domains [J].
Boraston, AB ;
Notenboom, V ;
Warren, RAJ ;
Kilburn, DG ;
Rose, DR ;
Davies, G .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 327 (03) :659-669
[7]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[8]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[9]   A novel superfamily containing the β-grasp fold involved in binding diverse soluble ligands [J].
Burroughs, A. Maxwell ;
Balaji, S. ;
Iyer, Lakshminarayan M. ;
Aravind, L. .
BIOLOGY DIRECT, 2007, 2 (1)
[10]  
CHAUDHURI I, 2007, PROTEINS