Contact prediction using mutual information and neural nets

被引:82
作者
Shackelford, George [1 ]
Karplus, Kevin [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
关键词
contact prediction; CASP7; SAM_T06; neural net; significance of mutual information; contingency tables; gamma distribution;
D O I
10.1002/prot.21791
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Prediction of protein structures continues to be a difficult problem, particularly when there are no solved structures for homologous proteins to use as templates. Local structure prediction (secondary structure and burial) is fairly reliable, but does not provide enough information to produce complete three-dimensional structures. Residue-residue contact prediction, though still not highly reliable, may provide a useful guide for assembling local structure prediction into full tertiary prediction. We develop a neural network which is applied to pairs of residue positions and outputs a probability of contact between the positions. One of the neural net inputs is a novel statistic for detecting correlated mutations: the statistical significance of the mutual information between the corresponding columns of a multiple sequence alignment. This statistic, combined with a second statistic based on the propensity of two amino acid types being in contact, results in a simple neural network that is a good predictor Of contacts. Adding more features from amino-acid distributions and local structure predictions, the final neural network predicts contacts better than other submitted contact predictions at CASP7, including contact predictions derived from fragment-based tertiary models on free-modeling domains. It is still not known if contact predictions can improve tertiary models on free-modeling domains. Available at http://www.soe.ucs-c.edulresearch/compbio/SAM_T06/T06-query.html.
引用
收藏
页码:159 / 164
页数:6
相关论文
共 14 条
[1]  
[Anonymous], [No title captured]
[2]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) :535-542
[3]   Information-theoretic dissection of pairwise contact potentials [J].
Cline, MS ;
Karplus, K ;
Lathrop, RH ;
Smith, TF ;
Rogers, RG ;
Haussler, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 49 (01) :7-14
[4]   Influence of conservation on calculations of amino acid covariance in multiple sequence alignments [J].
Fodor, AA ;
Aldrich, RW .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (02) :211-221
[5]   CORRELATED MUTATIONS AND RESIDUE CONTACTS IN PROTEINS [J].
GOBEL, U ;
SANDER, C ;
SCHNEIDER, R ;
VALENCIA, A .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1994, 18 (04) :309-317
[6]   Empirical evaluation of the improved Rprop learning algorithms [J].
Igel, C ;
Hüsken, M .
NEUROCOMPUTING, 2003, 50 :105-123
[7]   SAM-T04: What is new in protein-structure prediction for CASP6 [J].
Karplus, K ;
Katzman, S ;
Shackleford, G ;
Koeva, M ;
Draper, J ;
Barnes, B ;
Soriano, M ;
Hughey, R .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 :135-142
[8]  
NISSEN S, 2004, FAST ARTIFICIAL NEUR
[9]   COMPARATIVE PROTEIN MODELING BY SATISFACTION OF SPATIAL RESTRAINTS [J].
SALI, A ;
BLUNDELL, TL .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 234 (03) :779-815
[10]   Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements [J].
Schäffer, AA ;
Aravind, L ;
Madden, TL ;
Shavirin, S ;
Spouge, JL ;
Wolf, YI ;
Koonin, EV ;
Altschul, SF .
NUCLEIC ACIDS RESEARCH, 2001, 29 (14) :2994-3005