A posteriori quality control for the curation and reuse of public proteomics data

被引:24
作者
Foster, Joseph M. [1 ]
Degroeve, Sven [2 ,3 ]
Gatto, Laurent [4 ]
Visser, Matthieu [5 ]
Wang, Rui [1 ]
Griss, Johannes [6 ]
Apweiler, Rolf [1 ]
Martens, Lennart [2 ,3 ]
机构
[1] European Bioinformat Inst, EMBL Outstn, Cambridge, England
[2] VIB, Dept Med Prot Res, Ghent, Belgium
[3] Univ Ghent, Dept Biochem, B-9000 Ghent, Belgium
[4] Univ Cambridge, Dept Biochem, Cambridge Ctr Prote, Cambridge Syst Biol Ctr, Cambridge CB2 1QW, England
[5] Philips Res Labs, Cambridge, England
[6] Med Univ Vienna, Dept Med, Vienna Gen Hosp, Vienna, Austria
基金
英国生物技术与生命科学研究理事会;
关键词
Bioinformatics; PRIDE; Quality assurance; Quality control; TANDEM MASS-SPECTROMETRY; LARGE-SCALE PROTEOMICS; LIQUID-CHROMATOGRAPHY; PEPTIDE IDENTIFICATION; PROTEIN MIXTURES; PLASMA-PROTEOME; QUANTIFICATION; QUANTITATION; REPOSITORY; STRATEGY;
D O I
10.1002/pmic.201000602
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Proteomics is a rapidly expanding field encompassing a multitude of complex techniques and data types. To date much effort has been devoted to achieving the highest possible coverage of proteomes with the aim to inform future developments in basic biology as well as in clinical settings. As a result, growing amounts of data have been deposited in publicly available proteomics databases. These data are in turn increasingly reused for orthogonal downstream purposes such as data mining and machine learning. These downstream uses however, need ways to a posteriori validate whether a particular data set is suitable for the envisioned purpose. Furthermore, the (semi-) automatic curation of repository data is dependent on analyses that can highlight misannotation and edge conditions for data sets. Such curation is an important prerequisite for efficient proteomics data reuse in the life sciences in general. We therefore present here a selection of quality control metrics and approaches for the a posteriori detection of potential issues encountered in typical proteomics data sets. We illustrate our metrics by relying on publicly available data from the Proteomics Identifications Database ( PRIDE), and simultaneously show the usefulness of the large body of PRIDE data as a means to derive empirical background distributions for relevant metrics.
引用
收藏
页码:2182 / 2194
页数:13
相关论文
共 30 条
[1]  
Bell AW, 2009, NAT METHODS, V6, P423, DOI [10.1038/NMETH.1333, 10.1038/nmeth.1333]
[2]   Identification and relative quantitation of protein mixtures by enzymatic digestion followed by capillary reversed-phase liquid chromatography-tandem mass spectrometry [J].
Bondarenko, PV ;
Chelius, D ;
Shaler, TA .
ANALYTICAL CHEMISTRY, 2002, 74 (18) :4741-4749
[3]   Quantitative profiling of proteins in complex mixtures using liquid chromatography and mass spectrometry [J].
Chelius, D ;
Bondarenko, PV .
JOURNAL OF PROTEOME RESEARCH, 2002, 1 (04) :317-323
[4]   Decoding signalling networks by mass spectrometry-based proteomics [J].
Choudhary, Chunaram ;
Mann, Matthias .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2010, 11 (06) :427-439
[5]   MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification [J].
Cox, Juergen ;
Mann, Matthias .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1367-1372
[6]   Open source system for analyzing, validating, and storing protein identification data [J].
Craig, R ;
Cortens, JP ;
Beavis, RC .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (06) :1234-1242
[7]  
Desiere F, 2005, GENOME BIOL, V6
[8]   Quantifying the Impact of Chimera MS/MS Spectra on Peptide Identification in Large-Scale Proteomics Studies [J].
Houel, Stephane ;
Abernathy, Robert ;
Renganathan, Kutralanathan ;
Meyer-Arendt, Karen ;
Ahn, Natalie G. ;
Old, William M. .
JOURNAL OF PROTEOME RESEARCH, 2010, 9 (08) :4152-4160
[9]   Interferences and contaminants encountered in modern mass spectrometry [J].
Keller, Bernd O. ;
Suj, Jie ;
Young, Alex B. ;
Whittal, Randy M. .
ANALYTICA CHIMICA ACTA, 2008, 627 (01) :71-81
[10]   Analyzing large-scale proteomics projects with latent semantic indexing [J].
Klie, Sebastian ;
Martens, Lennart ;
Vizcaino, Juan Antonio ;
Cote, Richard ;
Jones, Phil ;
Apweiler, Rolf ;
Hinneburg, Alexander ;
Hermjakob, Henning .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :182-191