The PSI semantic validator: A framework to check MIAPE compliance of proteomics data

被引:36
作者
Montecchi-Palazzi, Luisa [1 ]
Kerrien, Samuel [1 ]
Reisinger, Florian [1 ]
Aranda, Bruno [1 ]
Jones, Andrew R. [2 ]
Martens, Lennart [1 ]
Hermjakob, Henning [1 ]
机构
[1] European Bioinformat Inst, EMBL, Cambridge, England
[2] Univ Liverpool, Fac Vet Sci, Dept Preclin Vet Sci, Liverpool L69 3BX, Merseyside, England
基金
英国惠康基金;
关键词
Bioinformatics; Data mining; Semantic validation; GEL-ELECTROPHORESIS; MINIMUM INFORMATION; MASS-SPECTROMETRY; GUIDELINES; FORMAT; TOOL;
D O I
10.1002/pmic.200900189
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio-ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator.
引用
收藏
页码:5112 / 5119
页数:8
相关论文
共 32 条
[1]   PRIDE Converter: making proteomics data-sharing easy [J].
Barsnes, Harald ;
Vizcaino, Juan Antonio ;
Eidhammer, Ingvar ;
Martens, Lennart .
NATURE BIOTECHNOLOGY, 2009, 27 (07) :598-599
[2]   A toolkit for capturing and sharing FuGE experiments [J].
Belhajjame, Khalid ;
Jones, Andrew R. ;
Paton, Norman W. .
BIOINFORMATICS, 2008, 24 (22) :2647-2649
[3]   Guidelines for reporting the use of mass spectrometry informatics in proteomics [J].
Binz, Pierre-Alain ;
Barkovich, Robert ;
Beavis, Ronald C. ;
Creasy, David ;
Horn, David M. ;
Julian, Randall K., Jr. ;
Seymour, Sean L. ;
Taylor, Chris F. ;
Vandenbrouck, Yves .
NATURE BIOTECHNOLOGY, 2008, 26 (08) :862-862
[4]  
COCHRANE G, 2008, NUCLEIC ACIDS RES, V36, P5
[5]   InSilicoSpectro: An open-source proteomics library [J].
Colinge, J ;
Masselot, A ;
Carbonell, P ;
Appel, RD .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (03) :619-624
[6]   The Ontology Lookup Service:: more data and better tools for controlled vocabulary queries [J].
Cote, Richard G. ;
Jones, Philip ;
Martens, Lennart ;
Apweiler, Rolf ;
Hermjakob, Henning .
NUCLEIC ACIDS RESEARCH, 2008, 36 :W372-W376
[7]   mzML: A single, unifying data format for mass spectrometer output [J].
Deutsch, Eric .
PROTEOMICS, 2008, 8 (14) :2776-2777
[8]   Guidelines for reporting the use of gel electrophoresis in proteomics [J].
Gibson, Frank ;
Anderson, Leigh ;
Babnigg, Gyorgy ;
Baker, Mark ;
Berth, Matthias ;
Binz, Pierre-Alain ;
Borthwick, Andy ;
Cash, Phil ;
Day, Billy W. ;
Friedman, David B. ;
Garland, Donita ;
Gutstein, Howard B. ;
Hoogland, Christine ;
Jones, Neil A. ;
Khan, Alamgir ;
Klose, Joachim ;
Lamond, Angus I. ;
Lemkin, Peter F. ;
Lilley, Kathryn S. ;
Minden, Jonathan ;
Morris, Nicholas J. ;
Paton, Norman W. ;
Pisano, Michael R. ;
Prime, John E. ;
Rabilloud, Thierry ;
Stead, David A. ;
Taylor, Chris F. ;
Voshol, Hans ;
Wipat, Anil ;
Jones, Andrew R. .
NATURE BIOTECHNOLOGY, 2008, 26 (08) :863-864
[9]  
HAKKINEN J, 2009, PROTEOME RES, V6, P3037
[10]  
Henegar Corneliu, 2006, Journal of Bioinformatics and Computational Biology, V4, P833, DOI 10.1142/S0219720006002181