MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis

被引:413
作者
Tabb, David L. [1 ]
Fernando, Christopher G.
Chambers, Matthew C.
机构
[1] Vanderbilt Univ, Med Ctr, Mass Spectrometry Res Ctr, Dept Biomed Informat, Nashville, TN 37232 USA
[2] Vanderbilt Univ, Med Ctr, Mass Spectrometry Res Ctr, Dept Biochem, Nashville, TN 37232 USA
[3] W Virginia Univ, Inst Technol, Montgomery, WV 25136 USA
关键词
proteomics; identification; statistical distribution; reversed database; peak filtering;
D O I
10.1021/pr0604054
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
引用
收藏
页码:654 / 661
页数:8
相关论文
共 22 条
[1]   Potential for false positive identifications from large databases through tandem mass spectrometry [J].
Cargile, BJ ;
Bundy, JL ;
Stephenson, JL .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :1082-1085
[2]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[3]   De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[4]  
Edwards N, 2002, LECT NOTES COMPUT SC, V2452, P68
[5]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[6]  
FRIDMAN T, 2005, BIOINFORM COMPUT BIO, V3, P455
[7]   Open mass spectrometry search algorithm [J].
Geer, LY ;
Markey, SP ;
Kowalak, JA ;
Wagner, L ;
Xu, M ;
Maynard, DM ;
Yang, XY ;
Shi, WY ;
Bryant, SH .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :958-964
[8]   Randomized sequence databases for tandem mass spectrometry peptide and protein identification [J].
Higdon, R ;
Hogan, JM ;
Van Belle, G ;
Kolker, E .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2005, 9 (04) :364-379
[9]   Global survey of organ and organelle protein expression in mouse: Combined proteomic and transcriptomic profiling [J].
Kislinger, T ;
Cox, B ;
Kannan, A ;
Chung, C ;
Hu, PZ ;
Ignatchenko, A ;
Scott, MS ;
Gramolini, AO ;
Morris, Q ;
Hallett, MT ;
Rossant, J ;
Hughes, TR ;
Frey, B ;
Emili, A .
CELL, 2006, 125 (01) :173-186
[10]   MS1, MS2, and SQT - three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications [J].
McDonald, WH ;
Tabb, DL ;
Sadygov, RG ;
MacCoss, MJ ;
Venable, J ;
Graumann, J ;
Johnson, JR ;
Cociorva, D ;
Yates, JR .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2004, 18 (18) :2162-2168