Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior

被引:148
作者
Kim, Yohan [1 ]
Sidney, John [1 ]
Pinilla, Clemencia [2 ]
Sette, Alessandro [1 ]
Peters, Bjoern [1 ]
机构
[1] La Jolla Inst Allergy & Immunol, Div Vaccine Discovery, La Jolla, CA USA
[2] Torrey Pines Inst Mol Studies, San Diego, CA USA
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
美国国家卫生研究院;
关键词
T-CELL EPITOPES; SUBSTITUTION MATRICES; PROTEIN SEQUENCES; PREDICTION; SPECIFICITY; MOLECULES; DATABASE; AFFINITIES;
D O I
10.1186/1471-2105-10-394
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Experts in peptide: MHC binding studies are often able to estimate the impact of a single residue substitution based on a heuristic understanding of amino acid similarity in an experimental context. Our aim is to quantify this measure of similarity to improve peptide: MHC binding prediction methods. This should help compensate for holes and bias in the sequence space coverage of existing peptide binding datasets. Results: Here, a novel amino acid similarity matrix (PMBEC) is directly derived from the binding affinity data of combinatorial peptide mixtures. Like BLOSUM62, this matrix captures well-known physicochemical properties of amino acid residues. However, PMBEC differs markedly from existing matrices in cases where residue substitution involves a reversal of electrostatic charge. To demonstrate its usefulness, we have developed a new peptide: MHC class I binding prediction method, using the matrix as a Bayesian prior. We show that the new method can compensate for missing information on specific residues in the training data. We also carried out a large-scale benchmark, and its results indicate that prediction performance of the new method is comparable to that of the best neural network based approaches for peptide: MHC class I binding. Conclusion: A novel amino acid similarity matrix has been derived for peptide: MHC binding interactions. One prominent feature of the matrix is that it disfavors substitution of residues with opposite charges. Given that the matrix was derived from experimentally determined peptide: MHC binding affinity measurements, this feature is likely shared by all peptide: protein interactions. In addition, we have demonstrated the usefulness of the matrix as a Bayesian prior in an improved scoring-matrix based peptide: MHC class I prediction method. A software implementation of the method is available at: http://www.mhc-pathway.net/smmpmbec.
引用
收藏
页数:11
相关论文
共 29 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   AMINO-ACID SUBSTITUTION DURING FUNCTIONALLY CONSTRAINED DIVERGENT EVOLUTION OF PROTEIN SEQUENCES [J].
BENNER, SA ;
COHEN, MA ;
GONNET, GH .
PROTEIN ENGINEERING, 1994, 7 (11) :1323-1332
[3]   A Detailed Analysis of the Murine TAP Transporter Substrate Specificity [J].
Burgevin, Anne ;
Saveanu, Loredana ;
Kim, Yohan ;
Barilleau, Emilie ;
Kotturi, Maya ;
Sette, Alessandro ;
van Endert, Peter ;
Peters, Bjoern .
PLOS ONE, 2008, 3 (06)
[4]   Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach [J].
Buus, S ;
Lauemoller, SL ;
Worning, P ;
Kesmir, C ;
Frimurer, T ;
Corbet, S ;
Fomsgaard, A ;
Hilden, J ;
Holm, A ;
Brunak, S .
TISSUE ANTIGENS, 2003, 62 (05) :378-384
[5]  
Dayhoff M O., 1978, Atlas of Protein Seq Struct, ppp 345
[6]   EXHAUSTIVE MATCHING OF THE ENTIRE PROTEIN-SEQUENCE DATABASE [J].
GONNET, GH ;
COHEN, MA ;
BENNER, SA .
SCIENCE, 1992, 256 (5062) :1443-1445
[7]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[8]   A STRUCTURAL BASIS FOR SEQUENCE COMPARISONS - AN EVALUATION OF SCORING METHODOLOGIES [J].
JOHNSON, MS ;
OVERINGTON, JP .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 233 (04) :716-738
[9]  
Kann M, 2000, PROTEINS, V41, P498, DOI 10.1002/1097-0134(20001201)41:4<498::AID-PROT70>3.0.CO
[10]  
2-3