Semi-supervised learning for peptide identification from shotgun proteomics datasets

被引:1706
作者
Kall, Lukas
Canterbury, Jesse D.
Weston, Jason
Noble, William Stafford
MacCoss, Michael J.
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Amer Inc, NEC Labs, Princeton, NJ 08540 USA
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
D O I
10.1038/NMETH1113
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.
引用
收藏
页码:923 / 925
页数:3
相关论文
共 12 条
[1]   A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].
Anderson, DC ;
Li, WQ ;
Payan, DG ;
Noble, WS .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146
[2]  
[Anonymous], P 5 ANN WORKSH COMP
[3]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[4]   Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search [J].
Keller, A ;
Nesvizhskii, AI ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2002, 74 (20) :5383-5392
[5]   Probability-based validation of protein identifications using a modified SEQUEST algorithm [J].
MacCoss, MJ ;
Wu, CC ;
Yates, JR .
ANALYTICAL CHEMISTRY, 2002, 74 (21) :5593-5599
[6]   Qscore: An algorithm for evaluating SEQUEST database search results [J].
Moore, RE ;
Young, MK ;
Lee, TD .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2002, 13 (04) :378-386
[7]   Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: The yeast proteome [J].
Peng, JM ;
Elias, JE ;
Thoreen, CC ;
Licklider, LJ ;
Gygi, SP .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (01) :43-50
[8]  
Perkins DN, 1999, ELECTROPHORESIS, V20, P3551, DOI 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO
[9]  
2-2
[10]   Statistical significance for genomewide studies [J].
Storey, JD ;
Tibshirani, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (16) :9440-9445