Learning MHC I-peptide binding

被引:71
作者
Jojic, Nebojsa [1 ]
Reyes-Gomez, Manuel
Heckerman, David
Kadie, Carl
Schueler-Furman, Ora
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Hebrew Univ Jerusalem, Hadassah Med Sch, Dept Mol Biol & Biotechnol, Jerusalem, Israel
关键词
PROTEIN; PREDICTION; POTENTIALS; DATABASE; ALLELES;
D O I
10.1093/bioinformatics/btl255
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation and results: Motivated by the ability of a simple threading approach to predict MHC I-peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule). Our model significantly outperforms the standard threading approach in binding energy prediction. In our approach, which we call adaptive double threading, the parameters of the threading model are learnable, and both MHC and peptide sequences can be threaded onto structures of other alleles. These two properties make our model appropriate for predicting binding for alleles for which very little data ( if any) is available beyond just their sequence, including prediction for alleles for which 3D structures are not available. The ability of our model to generalize beyond the MHC type for which training data is available also separates our approach from epitope prediction methods which treat MHC alleles as symbolic types, rather than biological sequences. We used the trained binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system.
引用
收藏
页码:E227 / E235
页数:9
相关论文
共 18 条
[1]   Replicative fitness of historical and recent HIV-1 isolates suggests HIV-1 attenuation over time [J].
Ariën, KK ;
Troyer, RA ;
Gali, Y ;
Colebunders, RL ;
Arts, EJ ;
Vanham, G .
AIDS, 2005, 19 (15) :1555-1564
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]  
Betancourt MR, 1999, PROTEIN SCI, V8, P361
[4]   MHCBN: a comprehensive database of MHC binding and non-binding peptides [J].
Bhasin, M ;
Singh, H ;
Raghava, GPS .
BIOINFORMATICS, 2003, 19 (05) :665-666
[5]  
HECKEMAN D, 2006, RECOMB
[6]   A NEW APPROACH TO PROTEIN FOLD RECOGNITION [J].
JONES, DT ;
TAYLOR, WR ;
THORNTON, JM .
NATURE, 1992, 358 (6381) :86-89
[7]  
JOVIC N, 2005, NIPS
[8]  
Kohavi R., 1997, International Journal on Artificial Intelligence Tools (Architectures, Languages, Algorithms), V6, P537, DOI 10.1142/S021821309700027X
[9]   THE ANTIGENIC IDENTITY OF PEPTIDE-MHC COMPLEXES - A COMPARISON OF THE CONFORMATIONS OF 5 VIRAL PEPTIDES PRESENTED BY HLA-A2 [J].
MADDEN, DR ;
GARBOCZI, DN ;
WILEY, DC .
CELL, 1993, 75 (04) :693-708
[10]   Statistical potentials for fold assessment [J].
Melo, F ;
Sánchez, R ;
Sali, A .
PROTEIN SCIENCE, 2002, 11 (02) :430-448