1001 optimal PDB structure alignments: Integer programming methods for finding the maximum contact map overlap

被引:89
作者
Caprara, A
Carr, R
Istrail, S
Lancia, G
Walenz, B
机构
[1] Univ Udine, DIMI, I-33100 Udine, Italy
[2] Celera Appl Biosyst, Informat Res, Rockville, MD 20850 USA
[3] Univ Bologna, DEIS, I-40136 Bologna, Italy
[4] Sandia Natl Labs, Albuquerque, NM 87185 USA
关键词
integer programming; Lagrangian relaxation; protein structure alignment; contact map overlap;
D O I
10.1089/106652704773416876
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.
引用
收藏
页码:27 / 52
页数:26
相关论文
共 44 条
  • [1] A TIGHT LINEARIZATION AND AN ALGORITHM FOR ZERO-ONE QUADRATIC-PROGRAMMING PROBLEMS
    ADAMS, WP
    SHERALI, HD
    [J]. MANAGEMENT SCIENCE, 1986, 32 (10) : 1274 - 1290
  • [2] [Anonymous], 1989, GENETIC ALGORITHM SE
  • [3] [Anonymous], P ANN INT C COMP MOL, DOI [10.1145/565196.565209., DOI 10.1145/565196.565209]
  • [4] A LIFT-AND-PROJECT CUTTING PLANE ALGORITHM FOR MIXED 0-1 PROGRAMS
    BALAS, E
    CERIA, S
    CORNUEJOLS, G
    [J]. MATHEMATICAL PROGRAMMING, 1993, 58 (03) : 295 - 324
  • [5] FINDING A MAXIMUM CLIQUE IN AN ARBITRARY GRAPH
    BALAS, E
    YU, CS
    [J]. SIAM JOURNAL ON COMPUTING, 1986, 15 (04) : 1054 - 1068
  • [6] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [7] A heuristic method for the set covering problem
    Caprara, A
    Fischetti, M
    Toth, P
    [J]. OPERATIONS RESEARCH, 1999, 47 (05) : 730 - 743
  • [8] CARRARESI P, 1994, DIMACS SERIES DISCRE, V16, P147
  • [9] Cela E., 1998, The Quadratic Assignment Problem: Theory and Algorithms
  • [10] Cook W., 1998, Combinatorial Optimization