Measuring the similarity of protein structures by means of the universal similarity metric

被引:87
作者
Krasnogor, N [1 ]
Pelta, DA
机构
[1] Univ Nottingham, Automated Scheduling Optimisat & Planning Grp, Nottingham NG8 1BB, England
[2] Univ Granada, ETSI Informat, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
关键词
D O I
10.1093/bioinformatics/bth031
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As an increasing number of protein structures become available, the need for algorithms that can quantify the similarity between protein structures increases as well. Thus, the comparison of proteins' structures, and their clustering accordingly to a given similarity measure, is at the core of today's biomedical research. In this paper, we show how an algorithmic information theory inspired Universal Similarity Metric (USM) can be used to calculate similarities between protein pairs. The method, besides being theoretically supported, is surprisingly simple to implement and computationally efficient. Results: Structural similarity between proteins in four different datasets was measured using the USM. The sample employed represented alpha, beta, alpha-beta, tim-barrel, globins and serpine protein types. The use of the proposed metric allows for a correct measurement of similarity and classification of the proteins in the four datasets.
引用
收藏
页码:1015 / 1021
页数:7
相关论文
共 34 条
[1]  
ARTIMIUK PJ, 1995, TOP CURR CHEM, V174, P73
[2]   Chain letters & evolutionary histories [J].
Bennett, CH ;
Li, M ;
Ma, B .
SCIENTIFIC AMERICAN, 2003, 288 (06) :76-81
[3]   Information distance [J].
Bennett, CH ;
Gacs, P ;
Li, M ;
Vitanyi, FMB ;
Zurek, WH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (04) :1407-1423
[4]  
CAPRARA A, 2002, P RECOMB 2002
[5]  
CARR B, 2002, GECCO 2002 P GEN EV
[6]  
CHEW LP, 2002, 18 ACM S COMP GEOM
[7]  
CILIBRASI R, IN PRESS ALGORITHMIC
[8]   ON THE PREDICTION OF PROTEIN-STRUCTURE - THE SIGNIFICANCE OF THE ROOT-MEAN-SQUARE DEVIATION [J].
COHEN, FE ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1980, 138 (02) :321-333
[9]  
GOLDMAN D, 1999, P 40 ANN S FDN COMP, P512
[10]  
GOLDMAN D, 2000, THESIS UC BERKELEY