Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project

被引:43
作者
Adamski, M
Blackwell, T
Menon, R
Martens, L
Hermjakob, H
Taylor, C
Omenn, GS
States, DJ
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] State Univ Ghent, B-9000 Ghent, Belgium
[3] European Bioinformat Inst, Hinxton, England
关键词
HUPO; Plasma Proteome project; protein identification; proteomics database; proteomics data integration;
D O I
10.1002/pmic.200500186
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The pilot phase of the HUPO Plasma Proteome Project (PPP) is an international collaboration to catalog the protein composition of human blood plasma and serum by analyzing standardized aliquots of reference serum and plasma specimens using a variety of experimental techniques. Data management for this project included collection, integration, analysis, and dissemination of findings from participating organizations world-wide. Accomplishing this task required a communication and coordination infrastructure specific enough to support meaningful integration of results from all participants, but flexible enough to react to changing requirements and new insights gained during the course of the project and to allow participants with varying informatics capabilities to contribute. Challenges included integrating heterogeneous data, reducing redundant information to minimal identification sets, and data annotation. Our data integration workflow assembles a minimal and representative set of protein identifications, which account for the contributed data. It accommodates incomplete concordance of results from different laboratories, ambiguity and redundancy in contributed identifications, and redundancy in the protein sequence databases. Recommendations of the PPP for future large-scale proteomics endeavors are described.
引用
收藏
页码:3246 / 3261
页数:16
相关论文
共 23 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   Improving large-scale proteomics by clustering of mass spectrometry data [J].
Beer, I ;
Barnea, E ;
Ziv, T ;
Admon, A .
PROTEOMICS, 2004, 4 (04) :950-960
[3]  
Blaschke Christian, 2002, Brief Bioinform, V3, P154, DOI 10.1093/bib/3.2.154
[4]   The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].
Carr, S ;
Aebersold, R ;
Baldwin, M ;
Burlingame, A ;
Clauser, K ;
Nesvizhskii, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533
[5]  
Desiere F, 2005, GENOME BIOL, V6
[6]   Pedro: a configurable data entry tool for XML [J].
Garwood, KL ;
Taylor, CF ;
Runte, KJ ;
Brass, A ;
Oliver, SG ;
Paton, NW .
BIOINFORMATICS, 2004, 20 (15) :2463-2465
[7]   Global analysis of protein expression in yeast [J].
Ghaemmaghami, S ;
Huh, W ;
Bower, K ;
Howson, RW ;
Belle, A ;
Dephoure, N ;
O'Shea, EK ;
Weissman, JS .
NATURE, 2003, 425 (6959) :737-741
[8]   Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search [J].
Keller, A ;
Nesvizhskii, AI ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2002, 74 (20) :5383-5392
[9]  
Keller Andrew, 2002, OMICS A Journal of Integrative Biology, V6, P207, DOI 10.1089/153623102760092805
[10]   The International Protein Index: An integrated database for proteomics experiments [J].
Kersey, PJ ;
Duarte, J ;
Williams, A ;
Karavidopoulou, Y ;
Birney, E ;
Apweiler, R .
PROTEOMICS, 2004, 4 (07) :1985-1988