iProphet: Multi-level Integrative Analysis of Shotgun Proteomic Data Improves Peptide and Protein Identification Rates and Error Estimates

被引:441
作者
Shteynberg, David [1 ]
Deutsch, Eric W. [1 ]
Lam, Henry [2 ]
Eng, Jimmy K. [3 ]
Sun, Zhi [1 ]
Tasman, Natalie [1 ]
Mendoza, Luis [1 ]
Moritz, Robert L. [1 ]
Aebersold, Ruedi [4 ,5 ,6 ]
Nesvizhskii, Alexey I. [7 ,8 ]
机构
[1] Inst Syst Biol, Seattle, WA USA
[2] Hong Kong Univ Sci & Technol, Dept Chem & Biomol Engn, Hong Kong, Hong Kong, Peoples R China
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[4] ETH, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland
[5] Univ Zurich, Fac Sci, Zurich, Switzerland
[6] Ctr Syst Physiol & Metab Dis, Zurich, Switzerland
[7] Univ Michigan, Dept Pathol, Ann Arbor, MI 48105 USA
[8] Univ Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48105 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
FALSE DISCOVERY RATES; SPECTROMETRY-BASED PROTEOMICS; TANDEM MASS-SPECTROMETRY; SPECTRAL DATA; BIOINFORMATICS TOOLS; SEQUENCE DATABASES; STATISTICAL-MODEL; SEARCH STRATEGY; DECOY DATABASES; VALIDATION;
D O I
10.1074/mcp.M111.007690
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines have been developed that identify different, overlapping subsets of the sample peptides from a particular set of tandem mass spectrometry spectra. We present iProphet, the new addition to the widely used open-source suite of proteomic data analysis tools Trans-Proteomics Pipeline. Applied in tandem with PeptideProphet, it provides more accurate representation of the multilevel nature of shotgun proteomic data. iProphet combines the evidence from multiple identifications of the same peptide sequences across different spectra, experiments, precursor ion charge states, and modified states. It also allows accurate and effective integration of the results from multiple database search engines applied to the same data. The use of iProphet in the Trans-Proteomics Pipeline increases the number of correctly identified peptides at a constant false discovery rate as compared with both PeptideProphet and another state-of-the-art tool Percolator. As the main outcome, iProphet permits the calculation of accurate posterior probabilities and false discovery rate estimates at the level of sequence identical peptide identifications, which in turn leads to more accurate probability estimates at the protein level. Fully integrated with the Trans-Proteomics Pipeline, it supports all commonly used MS instruments, search engines, and computer platforms. The performance of iProphet is demonstrated on two publicly available data sets: data from a human whole cell lysate proteome profiling experiment representative of typical proteomic data sets, and from a set of Streptococcus pyogenes experiments more representative of organismspecific composite data sets. Molecular & Cellular Proteomics 10: 10.1074/ mcp. M111.007690, 1-15, 2011.
引用
收藏
页数:15
相关论文
共 63 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]  
Alves G., 2007, BIOL DIRECT, V2
[3]   Enhancing peptide identification confidence by combining search methods [J].
Alves, Gelio ;
Wu, Wells W. ;
Wang, Guanghui ;
Shen, Rong-Fong ;
Yu, Yi-Kuo .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (08) :3102-3113
[4]   Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics [J].
Baerenfaller, Katja ;
Grossmann, Jonas ;
Grobei, Monica A. ;
Hull, Roger ;
Hirsch-Hoffmann, Matthias ;
Yalovsky, Shaul ;
Zimmermann, Philip ;
Grossniklaus, Ueli ;
Gruissem, Wilhelm ;
Baginsky, Sacha .
SCIENCE, 2008, 320 (5878) :938-941
[5]   Quantitative mass spectrometry in proteomics: a critical review [J].
Bantscheff, Marcus ;
Schirle, Markus ;
Sweetman, Gavain ;
Rick, Jens ;
Kuster, Bernhard .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2007, 389 (04) :1017-1031
[6]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[7]   Comparison of Novel Decoy Database Designs for Optimizing Protein Identification Searches Using ABRF sPRG2006 Standard MS/MS Data Sets [J].
Bianco, Luca ;
Mead, Jennifer A. ;
Bessant, Conrad .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (04) :1782-1791
[8]   Mass Spectrometry Bioinformatics: Tools for Navigating the Proteomics Landscape [J].
Blackburn, Kevin ;
Goshe, Michael B. .
CURRENT ANALYTICAL CHEMISTRY, 2009, 5 (02) :131-143
[9]   A Global Protein Kinase and Phosphatase Interaction Network in Yeast [J].
Breitkreutz, Ashton ;
Choi, Hyungwon ;
Sharom, Jeffrey R. ;
Boucher, Lorrie ;
Neduva, Victor ;
Larsen, Brett ;
Lin, Zhen-Yuan ;
Breitkreutz, Bobby-Joe ;
Stark, Chris ;
Liu, Guomin ;
Ahn, Jessica ;
Dewar-Darch, Danielle ;
Reguly, Teresa ;
Tang, Xiaojing ;
Almeida, Ricardo ;
Qin, Zhaohui Steve ;
Pawson, Tony ;
Gingras, Anne-Claude ;
Nesvizhskii, Alexey I. ;
Tyers, Mike .
SCIENCE, 2010, 328 (5981) :1043-1046
[10]   A high-quality catalog of the Drosophila melanogaster proteome [J].
Brunner, Erich ;
Ahrens, Christian H. ;
Mohanty, Sonali ;
Baetschmann, Hansruedi ;
Loevenich, Sandra ;
Potthast, Frank ;
Deutsch, Eric W. ;
Panse, Christian ;
de Lichtenberg, Ulrik ;
Rinner, Oliver ;
Lee, Hookeun ;
Pedrioli, Patrick G. A. ;
Malmstrom, Johan ;
Koehler, Katja ;
Schrimpf, Sabine ;
Krijgsveld, Jeroen ;
Kregenow, Floyd ;
Heck, Albert J. R. ;
Hafen, Ernst ;
Schlapbach, Ralph ;
Aebersold, Ruedi .
NATURE BIOTECHNOLOGY, 2007, 25 (05) :576-583