Mascot-Derived False Positive Peptide Identifications Revealed by Manual Analysis of Tandem Mass Spectra

被引:44
作者
Chen, Yue [1 ,2 ]
Zhang, Junmei [2 ]
Xing, Gang [2 ]
Zhao, Yingming [2 ]
机构
[1] Univ Texas Arlington, Dept Chem & Biochem, Arlington, TX 76019 USA
[2] Univ Texas SW Med Ctr Dallas, Dept Biochem, Dallas, TX 75390 USA
关键词
protein identification; manual verification; automated database search; COMPLEX PROTEIN MIXTURES; AMINO-ACID-SEQUENCES; DATABASE SEARCH; POSTTRANSLATIONAL MODIFICATIONS; SPECTROMETRY DATA; ALGORITHM; PROTEOMICS; VALIDATION; MODEL; TAGS;
D O I
10.1021/pr900172v
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
False positives that arise when MS/MS data are used to search protein sequence databases remain a concern in proteomics research. Here, we present five types of false positives identified when aligning sequences to MS/MS spectra by Mascot database searching software. False positives arise because of (1) enzymatic digestion at abnormal sites; (2) misinterpretation of charge states; (3) misinterpretation of protein modifications; (4) incorrect assignment of the protein modification site; and (5) incorrect use of isotopic peaks. We present examples, clearly identified as false positives by manual inspection, that nevertheless were assigned high scores by Mascot sequence alignment algorithm. In some examples, the sequence assigned to the MS/MS spectrum explains more than 80% of the fragment ions present. Because of high sequence similarity between the false positives and their corresponding true hits, the false positive rate cannot be evaluated by the common method of using a reversed or scrambled sequence database. A common feature of the false positives is the presence of unmatched peaks in the MS/MS spectra. Our studies highlight the importance of using unmatched peaks to remove false positives and offer direction to aid development of better sequence alignment algorithms for peptide and PTM identification.
引用
收藏
页码:3141 / 3147
页数:7
相关论文
共 32 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   Proteomic study for the cellular responses to Cd2+ in Schizosaccharomyces pombe through amino acid-coded mass tagging and liquid chromatography tandem mass spectrometry [J].
Bae, W ;
Chen, X .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :596-607
[3]   Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra [J].
Chen, Y ;
Kwon, SW ;
Kim, SC ;
Zhao, YM .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) :998-1005
[4]   Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling [J].
Choi, Hyungwon ;
Ghosh, Debashis ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :286-292
[5]   A method for reducing the time required to match protein sequences with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2003, 17 (20) :2310-2316
[6]   Intensity-based protein identification by machine learning from a library of tandem mass spectra [J].
Elias, JE ;
Gibbons, FD ;
King, OD ;
Roth, FP ;
Gygi, SP .
NATURE BIOTECHNOLOGY, 2004, 22 (02) :214-219
[7]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[8]   Probity:: A protein identification algorithm with accurate assignment of the statistical significance of the results [J].
Eriksson, J ;
Fenyö, D .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (01) :32-36
[9]   A statistical basis for testing the significance of mass spectrometric protein identification results [J].
Eriksson, J ;
Chait, BT ;
Fenyö, D .
ANALYTICAL CHEMISTRY, 2000, 72 (05) :999-1005
[10]  
Eriksson J, 2002, PROTEOMICS, V2, P262, DOI 10.1002/1615-9861(200203)2:3<262::AID-PROT262>3.0.CO