Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories

被引:38
作者
Martens, L
Nesvizhskii, AI
Hermjakob, H
Adamski, M
Omenn, GS
Vandekerckhove, J
Gevaert, K
机构
[1] Univ Ghent, Dept Biochem, Fac Med & Hlth Sci, B-9000 Ghent, Belgium
[2] Inst Syst Biol, Seattle, WA USA
[3] European Bioinformat Inst, EMBL Outstn, Cambridge, England
[4] Univ Michigan, Dept Human Genet, Ann Arbor, MI 48109 USA
关键词
bioinformatics; databases; mass spectrometry;
D O I
10.1002/pmic.200401302
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitious proteomics experiment to date has reached its first milestone. The correspondingly impressive amount of data that came from this pilot project emphasized the need for a centralized dissemination mechanism and led to the development of a detailed, PPP specific data gathering infrastructure at the University of Michigan, Ann Arbor as well as the protein identifications database project at the European Bioinformatics Institute as a general proteomics data repository. One issue that crept up while discussing which data to store for the PPP concerns whether the raw, binary data coming from the mass spectrometers should be stored, or rather the more compact and already significantly processed peak lists. As this debate is not restricted to the PPP but relates to the proteomics community in general, we will attempt to detail the relative merits and caveats associated with centralized storage and dissemination of raw data and/or peak lists, building on the extensive experience gained during the PPP pilot phase. Finally, some suggestions are made for both immediate and future storage of MS data in public repositories.
引用
收藏
页码:3501 / 3505
页数:5
相关论文
共 16 条
[1]  
ADAMSKI M, 2005, PROTEOMICS, V5
[2]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[3]   Funding high-throughput data sharing [J].
Ball, CA ;
Sherlock, G ;
Brazma, A .
NATURE BIOTECHNOLOGY, 2004, 22 (09) :1179-1183
[4]   Improving large-scale proteomics by clustering of mass spectrometry data [J].
Beer, I ;
Barnea, E ;
Ziv, T ;
Admon, A .
PROTEOMICS, 2004, 4 (04) :950-960
[5]   The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].
Carr, S ;
Aebersold, R ;
Baldwin, M ;
Burlingame, A ;
Clauser, K ;
Nesvizhskii, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533
[6]   Exploring proteomes and analyzing protein processing by mass spectrometric identification of sorted N-terminal peptides [J].
Gevaert, K ;
Goethals, M ;
Martens, L ;
Van Damme, J ;
Staes, A ;
Thomas, GR ;
Vandekerckhove, J .
NATURE BIOTECHNOLOGY, 2003, 21 (05) :566-569
[7]   Quantitative analysis of complex protein mixtures using isotope-coded affinity tags [J].
Gygi, SP ;
Rist, B ;
Gerber, SA ;
Turecek, F ;
Gelb, MH ;
Aebersold, R .
NATURE BIOTECHNOLOGY, 1999, 17 (10) :994-999
[8]   The human Proteome organization a - Mission to advance proteome knowledge [J].
Hanash, S ;
Celis, JE .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (06) :413-414
[9]   Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry [J].
Li, XJ ;
Zhang, H ;
Ranish, JA ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2003, 75 (23) :6648-6657
[10]  
MARTENS L, 2005, PROTEOMICS, V5