Comprehensive evaluation of protein structure alignment methods: Scoring by geometric measures

被引:214
作者
Kolodny, R
Koehl, P
Levitt, M
机构
[1] Stanford Univ, Dept Biol Struct, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
comparison of structural alignment; protein structure alignment; protein structure comparison; geometric measures; ROC curves;
D O I
10.1016/j.jmb.2004.12.032
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We report the largest and most comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2930 protein domains specially selected from CATH v.2.4 to ensure sequence diversity. We consider an alignment good if it matches many residues, and the two substructures are geometrically similar. Even with this definition, evaluating structural alignment methods is not straightforward. At first, we compared the rates of true and false positives using receiver operating characteristic (ROC) curves with the CATH classification taken as a gold standard. This proved unsatisfactory in that the quality of the alignments is not taken into account: sometimes a method that finds less good alignments scores better than a method that finds better alignments. We correct this intrinsic limitation by using four different geometric match measures (SI, MI, SAS, and GSAS) to evaluate the quality of each structural alignment. With this improved analysis we show that there is a wide variation in the performance of different methods; the main reason for this is that it can be difficult to find a good structural alignment between two proteins even when such an alignment exists. We find that STRUCTAL and SSM perform best, followed by LSQMAN and CE. Our focus on the intrinsic quality of each alignment allows us to propose a new method, called "Best-of-All" that combines the best results of all methods. Many commonly used methods miss 10-50% of the good Best-of-All alignments. By putting existing structural alignments into proper perspective, our study allows better comparison of protein structures. By highlighting limitations of existing methods, it will spur the further development of better structural alignment methods. This will have significant biological implications now that structural comparison has come to play a central role in the analysis of experimental work on protein structure, protein function and protein evolution. (C) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1173 / 1188
页数:16
相关论文
共 47 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], 1978, ATLAS PROTEIN SEQUEN
[3]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[4]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[5]   Local feature frequency profile: A method to measure structural similarity in proteins [J].
Choi, IG ;
Kwon, J ;
Kim, SH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (11) :3797-3802
[6]   Structure-based evaluation of sequence comparison and fold recognition alignment accuracy [J].
Domingues, FS ;
Lackner, P ;
Andreeva, A ;
Sippl, MJ .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 297 (04) :1003-1013
[7]   Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments [J].
Friedberg, I ;
Kaplan, T ;
Margalit, H .
PROTEIN SCIENCE, 2000, 9 (11) :2278-2284
[8]  
Gerstein M, 1998, PROTEIN SCI, V7, P445
[9]   3D-Jury: a simple approach to improve protein structure predictions [J].
Ginalski, K ;
Elofsson, A ;
Fischer, D ;
Rychlewski, L .
BIOINFORMATICS, 2003, 19 (08) :1015-1018
[10]   Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching [J].
Gribskov, M ;
Robinson, NL .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :25-33