Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry

被引:248
作者
Reiter, Lukas [1 ,2 ,3 ,4 ,5 ,6 ]
Claassen, Manfred [1 ,7 ,8 ]
Schrimpf, Sabine P. [2 ,3 ,4 ]
Jovanovic, Marko [2 ,3 ,4 ,5 ,6 ]
Schmidt, Alexander [1 ]
Buhmann, Joachim M. [7 ,8 ]
Hengartner, Michael O. [2 ,3 ,4 ,5 ,6 ]
Aebersold, Ruedi [1 ,8 ,9 ]
机构
[1] ETH, Inst Mol Syst Biol, CH-8093 Zurich, Switzerland
[2] Univ Zurich, Inst Mol Biol, CH-8057 Zurich, Switzerland
[3] Univ Zurich, Ctr Model Organism Proteomes, CH-8057 Zurich, Switzerland
[4] Univ Zurich, Fac Sci, CH-8057 Zurich, Switzerland
[5] Univ Zurich, Program Mol Life Sci Zurich, CH-8057 Zurich, Switzerland
[6] ETH, CH-8057 Zurich, Switzerland
[7] ETH, Inst Computat Sci, CH-8092 Zurich, Switzerland
[8] Competence Ctr Syst Physiol & Metab Dis, CH-8093 Zurich, Switzerland
[9] Inst Syst Biol, Seattle, WA 98103 USA
基金
瑞士国家科学基金会;
关键词
STATISTICAL-MODEL; PILOT PHASE; PEPTIDES; VALIDATION; SEQUENCES; ABUNDANCE; PARALLEL; GENOME; MS/MS;
D O I
10.1074/mcp.M900317-MCP200
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Comprehensive characterization of a proteome is a fundamental goal in proteomics. To achieve saturation coverage of a proteome or specific subproteome via tandem mass spectrometric identification of tryptic protein sample digests, proteomics data sets are growing dramatically in size and heterogeneity. The trend toward very large integrated data sets poses so far unsolved challenges to control the uncertainty of protein identifications going beyond well established confidence measures for peptide-spectrum matches. We present MAYU, a novel strategy that reliably estimates false discovery rates for protein identifications in large scale data sets. We validated and applied MAYU using various large proteomics data sets. The data show that the size of the data set has an important and previously underestimated impact on the reliability of protein identifications. We particularly found that protein false discovery rates are significantly elevated compared with those of peptide-spectrum matches. The function provided by MAYU is critical to control the quality of proteome data repositories and thereby to enhance any study relying on these data sources. The MAYU software is available as standalone software and also integrated into the Trans-Proteomic Pipeline. Molecular & Cellular Proteomics 8: 2405-2417, 2009.
引用
收藏
页码:2405 / 2417
页数:13
相关论文
共 43 条
[1]   Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project [J].
Adamski, M ;
Blackwell, T ;
Menon, R ;
Martens, L ;
Hermjakob, H ;
Taylor, C ;
Omenn, GS ;
States, DJ .
PROTEOMICS, 2005, 5 (13) :3246-3261
[2]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[3]   Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics [J].
Baerenfaller, Katja ;
Grossmann, Jonas ;
Grobei, Monica A. ;
Hull, Roger ;
Hirsch-Hoffmann, Matthias ;
Yalovsky, Shaul ;
Zimmermann, Philip ;
Grossniklaus, Ueli ;
Gruissem, Wilhelm ;
Baginsky, Sacha .
SCIENCE, 2008, 320 (5878) :938-941
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   A high-quality catalog of the Drosophila melanogaster proteome [J].
Brunner, Erich ;
Ahrens, Christian H. ;
Mohanty, Sonali ;
Baetschmann, Hansruedi ;
Loevenich, Sandra ;
Potthast, Frank ;
Deutsch, Eric W. ;
Panse, Christian ;
de Lichtenberg, Ulrik ;
Rinner, Oliver ;
Lee, Hookeun ;
Pedrioli, Patrick G. A. ;
Malmstrom, Johan ;
Koehler, Katja ;
Schrimpf, Sabine ;
Krijgsveld, Jeroen ;
Kregenow, Floyd ;
Heck, Albert J. R. ;
Hafen, Ernst ;
Schlapbach, Ralph ;
Aebersold, Ruedi .
NATURE BIOTECHNOLOGY, 2007, 25 (05) :576-583
[6]  
Chu DS, 2006, NATURE, V443, P101, DOI 10.1038/nature05050
[7]   Using annotated peptide mass spectrum libraries for protein identification [J].
Craig, R. ;
Cortens, J. C. ;
Fenyo, D. ;
Beavis, R. C. .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (08) :1843-1849
[8]   Open source system for analyzing, validating, and storing protein identification data [J].
Craig, R ;
Cortens, JP ;
Beavis, RC .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (06) :1234-1242
[9]   Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast [J].
de Godoy, Lyris M. F. ;
Olsen, Jesper V. ;
Cox, Juergen ;
Nielsen, Michael L. ;
Hubner, Nina C. ;
Froehlich, Florian ;
Walther, Tobias C. ;
Mann, Matthias .
NATURE, 2008, 455 (7217) :1251-U60
[10]  
Desiere F, 2005, GENOME BIOL, V6