pep2pro: a new tool for comprehensive proteome data analysis to reveal information about organ-specific proteomes in Arabidopsis thaliana

被引:60
作者
Baerenfaller, Katja [1 ]
Hirsch-Hoffmann, Matthias [1 ]
Svozil, Julia [1 ]
Hull, Roger [2 ]
Russenberger, Doris [1 ]
Bischof, Sylvain [1 ]
Lu, Qingtao [3 ]
Gruissem, Wilhelm [1 ]
Baginsky, Sacha [1 ]
机构
[1] ETH, Dept Biol, CH-8092 Zurich, Switzerland
[2] Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England
[3] Chinese Acad Sci, Inst Bot, Beijing 100093, Peoples R China
关键词
STATISTICAL-MODEL; TANDEM; GENES; PROTEINS; IDENTIFICATIONS; PEROXISOMES; ANNOTATION; PEPTIDES; GENOMICS; MS/MS;
D O I
10.1039/c0ib00078g
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
pep2pro is a comprehensive proteome analysis database specifically suitable for flexible proteome data analysis. The pep2pro database schema offers solutions to the various challenges of developing a proteome data analysis database and because data integrated in pep2pro are in relational format, it enables flexible and detailed data analysis. The information provided here will facilitate building proteome data analysis databases for other organisms or applications. The capacity of the pep2pro database for the integration and analysis of large proteome datasets was demonstrated by creating the pep2pro dataset, which is an organ-specific characterisation of the Arabidopsis thaliana proteome containing 14 522 identified proteins based on 2.6 million peptide spectrum assignments. This dataset provides evidence of protein expression and reveals organ-specific processes. The high coverage and density of the dataset are essential for protein quantification by normalised spectral counting and allowed us to extract information that is usually not accessible in low-coverage datasets. With this quantitative protein information we analysed organ- and organelle-specific sub-proteomes. In addition we matched spectra to regions in the genome that were not predicted to have protein coding capacity and provide PCR validation for selected revised gene models. Furthermore, we analysed the peptide features that distinguish detected from non-detected peptides and found substantial disagreement between predicted and detected proteotypic peptides, suggesting that large-scale proteomics data are essential for efficient selection of proteotypic peptides in targeted proteomics surveys. The pep2pro dataset is available as a resource for plant systems biology at www.pep2pro.ethz.ch.
引用
收藏
页码:225 / 237
页数:13
相关论文
共 43 条
[1]   Improved scoring of functional groups from gene expression data by decorrelating GO graph structure [J].
Alexa, Adrian ;
Rahnenfuehrer, Joerg ;
Lengauer, Thomas .
BIOINFORMATICS, 2006, 22 (13) :1600-1607
[2]   Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics [J].
Baerenfaller, Katja ;
Grossmann, Jonas ;
Grobei, Monica A. ;
Hull, Roger ;
Hirsch-Hoffmann, Matthias ;
Yalovsky, Shaul ;
Zimmermann, Philip ;
Grossniklaus, Ueli ;
Gruissem, Wilhelm ;
Baginsky, Sacha .
SCIENCE, 2008, 320 (5878) :938-941
[3]   Gene Expression Analysis, Proteomics, and Network Discovery [J].
Baginsky, Sacha ;
Hennig, Lars ;
Zimmermann, Philip ;
Gruissem, Wilhelm .
PLANT PHYSIOLOGY, 2010, 152 (02) :402-410
[4]   Functional annotation of the Arabidopsis genome using controlled vocabularies [J].
Berardini, TZ ;
Mundodi, S ;
Reiser, L ;
Huala, E ;
Garcia-Hernandez, M ;
Zhang, PF ;
Mueller, LA ;
Yoon, J ;
Doyle, A ;
Lander, G ;
Moseyko, N ;
Yoo, D ;
Xu, I ;
Zoeckler, B ;
Montoya, M ;
Miller, N ;
Weems, D ;
Rhee, SY .
PLANT PHYSIOLOGY, 2004, 135 (02) :745-755
[5]   A high-quality catalog of the Drosophila melanogaster proteome [J].
Brunner, Erich ;
Ahrens, Christian H. ;
Mohanty, Sonali ;
Baetschmann, Hansruedi ;
Loevenich, Sandra ;
Potthast, Frank ;
Deutsch, Eric W. ;
Panse, Christian ;
de Lichtenberg, Ulrik ;
Rinner, Oliver ;
Lee, Hookeun ;
Pedrioli, Patrick G. A. ;
Malmstrom, Johan ;
Koehler, Katja ;
Schrimpf, Sabine ;
Krijgsveld, Jeroen ;
Kregenow, Floyd ;
Heck, Albert J. R. ;
Hafen, Ernst ;
Schlapbach, Ralph ;
Aebersold, Ruedi .
NATURE BIOTECHNOLOGY, 2007, 25 (05) :576-583
[6]   Finding the genes in genomic DNA [J].
Burge, CB ;
Karlin, S .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :346-354
[7]   Discovery and revision of Arabidopsis genes by proteogenomics [J].
Castellana, Natalie E. ;
Payne, Samuel H. ;
Shen, Zhouxin ;
Stanke, Mario ;
Bafna, Vineet ;
Briggs, Steven P. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (52) :21034-21038
[8]   Open source system for analyzing, validating, and storing protein identification data [J].
Craig, R ;
Cortens, JP ;
Beavis, RC .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (06) :1234-1242
[9]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[10]   Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast [J].
de Godoy, Lyris M. F. ;
Olsen, Jesper V. ;
Cox, Juergen ;
Nielsen, Michael L. ;
Hubner, Nina C. ;
Froehlich, Florian ;
Walther, Tobias C. ;
Mann, Matthias .
NATURE, 2008, 455 (7217) :1251-U60