Discriminative modelling of context-specific amino acid substitution probabilities

被引:37
作者
Angermueller, Christof [1 ,2 ]
Biegert, Andreas [3 ]
Soeding, Johannes [1 ,2 ]
机构
[1] Univ Munich, Gene Ctr Munich, D-81377 Munich, Germany
[2] Univ Munich, Dept Biochem, D-81377 Munich, Germany
[3] Genedata, D-82152 Martinsried, Germany
关键词
FOLD RECOGNITION; PROTEINS; MATRICES; DATABASE; TABLES; ALIGN;
D O I
10.1093/bioinformatics/bts622
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
MOTIVATION: Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method that yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared with standard substitution matrix-based alignment. RESULTS: Here, we develop an alternative discriminative approach to predict sequence context-specific substitution scores. We applied our approach to compute context-specific sequence profiles for Basic Local Alignment Search Tool (BLAST) and compared the new tool (CS-BLASTdis) to BLAST and the previous context-specific version (CS-BLASTgen). On a dataset filtered to 20% maximum sequence identity, CS-BLASTdisis was 51% more sensitive than BLAST and 17% more sensitive than CS-BLASTgenin, detecting remote homologues at 10% false discovery rate. At 30% maximum sequence identity, its alignments contain 21 and 12% more correct residue pairs than those of BLAST and CS-BLASTgen, respectively. Clear improvements are also seen when the approach is combined with PSI-BLAST and HHblits. We believe the context-specific approach should replace substitution matrices wherever sensitivity and alignment quality are critical.
引用
收藏
页码:3240 / 3247
页数:8
相关论文
共 25 条
[1]  
Almeida Luis B, 1998, On-Line Learning in Neural Networks, P111
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], 1993, Probabilistic inference using Markov chain Monte Carlo methods
[4]   Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins [J].
Baussand, J. ;
Deremble, C. ;
Carbone, A. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 67 (03) :695-708
[5]   Sequence context-specific profiles for homology searching [J].
Biegert, A. ;
Soeding, J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (10) :3770-3775
[6]  
Bottou L, 2004, LECT NOTES ARTIF INT, V3176, P146
[7]  
Caruana R., 2006, ACM INT C P SER, P161, DOI [10.1145/1143844.1143865, DOI 10.1145/1143844.1143865]
[8]  
Dayhoff M O., 1978, Atlas of Protein Seq Struct, ppp 345
[9]   Context-specific amino acid substitution matrices and their use in the detection of protein homologs [J].
Goonesekere, Nalin C. W. ;
Lee, Byungkook .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 71 (02) :910-919
[10]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919