HMMatch: Peptide identification by spectral matching of tandem mass spectra using hidden Markov models

被引:15
作者
Wu, Xue
Tseng, Chau-Wen
Edwards, Nathan
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
关键词
computational molecular biology; mass spectroscopy; HMM; peptide identification; algorithms;
D O I
10.1089/cmb.2007.0071
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.
引用
收藏
页码:1025 / 1043
页数:19
相关论文
共 39 条
[1]  
[Anonymous], 2001, Bioinformatics
[2]  
Bafna V., 2003, P 7 ANN INT C COMP M, P9
[3]  
Baum L.E., 1972, Inequalities III: Proceedings of the Third Symposium on Inequalities, page, V3, P1
[4]  
CHEN T, 2000, S DISCRETE ALGORITHM, P389
[5]   Using annotated peptide mass spectrum libraries for protein identification [J].
Craig, R. ;
Cortens, J. C. ;
Fenyo, D. ;
Beavis, R. C. .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (08) :1843-1849
[6]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[7]  
Danc?ik V., 1999, P 3 INT C COMP MOL B, P135
[8]  
DEMENTHON D, 2003, LAMP HMM
[9]  
Desiere F, 2006, NUCLEIC ACIDS RES, V34, pD655, DOI [10.1093/nar/gkj040, 10.1007/978-1-60761-444-9_19]
[10]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763