A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics

被引:389
作者
Nesvizhskii, Alexey I. [1 ,2 ]
机构
[1] Univ Michigan, Dept Pathol, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
关键词
Proteomics; Bioinformatics; Mass spectrometry; Peptide identification; Protein inference; Statistical models; False discovery rates; TANDEM MASS-SPECTROMETRY; FALSE DISCOVERY RATES; LARGE-SCALE PROTEOMICS; SEARCHING SEQUENCE DATABASES; CHARGE-STATE DETERMINATION; QUADRUPOLE COLLISION CELL; SPECTRAL LIBRARY SEARCH; POSTTRANSLATIONAL MODIFICATIONS; MS/MS SPECTRA; LIQUID-CHROMATOGRAPHY;
D O I
10.1016/j.jprot.2010.08.009
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:2092 / 2123
页数:32
相关论文
共 287 条
[1]   Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma [J].
Addona, Terri A. ;
Abbatiello, Susan E. ;
Schilling, Birgit ;
Skates, Steven J. ;
Mani, D. R. ;
Bunk, David M. ;
Spiegelman, Clifford H. ;
Zimmerman, Lisa J. ;
Ham, Amy-Joan L. ;
Keshishian, Hasmik ;
Hall, Steven C. ;
Allen, Simon ;
Blackman, Ronald K. ;
Borchers, Christoph H. ;
Buck, Charles ;
Cardasis, Helene L. ;
Cusack, Michael P. ;
Dodder, Nathan G. ;
Gibson, Bradford W. ;
Held, Jason M. ;
Hiltke, Tara ;
Jackson, Angela ;
Johansen, Eric B. ;
Kinsinger, Christopher R. ;
Li, Jing ;
Mesri, Mehdi ;
Neubert, Thomas A. ;
Niles, Richard K. ;
Pulsipher, Trenton C. ;
Ransohoff, David ;
Rodriguez, Henry ;
Rudnick, Paul A. ;
Smith, Derek ;
Tabb, David L. ;
Tegeler, Tony J. ;
Variyath, Asokan M. ;
Vega-Montoto, Lorenzo J. ;
Wahlander, Asa ;
Waldemarson, Sofia ;
Wang, Mu ;
Whiteaker, Jeffrey R. ;
Zhao, Lei ;
Anderson, N. Leigh ;
Fisher, Susan J. ;
Liebler, Daniel C. ;
Paulovich, Amanda G. ;
Regnier, Fred E. ;
Tempst, Paul ;
Carr, Steven A. .
NATURE BIOTECHNOLOGY, 2009, 27 (07) :633-U85
[2]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[3]   Unrestricted identification of modified proteins using MS/MS [J].
Ahrne, Erik ;
Mueller, Markus ;
Lisacek, Frederique .
PROTEOMICS, 2010, 10 (04) :671-686
[4]   A simple workflow to increase MS2 identification rate by subsequent spectral library search [J].
Ahrne, Erik ;
Masselot, Alexandre ;
Binz, Pierre-Alain ;
Mueller, Markus ;
Lisacek, Frederique .
PROTEOMICS, 2009, 9 (06) :1731-1736
[5]   Increased Confidence in Large-Scale Phosphoproteomics Data by Complementary Mass Spectrometric Techniques and Matching of Phosphopeptide Data Sets [J].
Alcolea, Maria P. ;
Kleiner, Oliver ;
Cutillas, Pedro R. .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (08) :3808-3815
[6]  
Alves G., 2007, BIOL DIRECT, V2
[7]   Enhancing peptide identification confidence by combining search methods [J].
Alves, Gelio ;
Wu, Wells W. ;
Wang, Guanghui ;
Shen, Rong-Fong ;
Yu, Yi-Kuo .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (08) :3102-3113
[8]   Detection of co-eluted peptides using database search methods [J].
Alves, Gelio ;
Ogurtsov, Aleksey Y. ;
Kwok, Siwei ;
Wu, Wells W. ;
Wang, Guanghui ;
Shen, Rong-Fong ;
Yu, Yi-Kuo .
BIOLOGY DIRECT, 2008, 3 (1) :1-16
[9]   A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].
Anderson, DC ;
Li, WQ ;
Payan, DG ;
Noble, WS .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146
[10]   Predictions of peptides' retention times in reversed-phase liquid chromatography as a new supportive tool to improve protein identification in proteomics [J].
Baczek, Tomasz ;
Kaliszan, Roman .
PROTEOMICS, 2009, 9 (04) :835-847