Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline

被引:159
作者
Hrydziuszko, Olga [2 ]
Viant, Mark R. [1 ,2 ]
机构
[1] Univ Birmingham, Sch Biosci, Birmingham B15 2TT, W Midlands, England
[2] Univ Birmingham, Ctr Syst Biol, Birmingham B15 2TT, W Midlands, England
关键词
FT-ICR; Metabolic profiling; Missing data; Missing entries; Signal processing; ORTHOTOPIC LIVER-TRANSPLANTATION; GEL-BASED PROTEOMICS; MICROARRAY DATA; VALUE IMPUTATION; PART I; EXPRESSION; BIOMARKERS; ACCURACY; GRAFT; NMR;
D O I
10.1007/s11306-011-0366-4
中图分类号
R5 [内科学];
学科分类号
100201 [内科学];
摘要
Missing values in mass spectrometry metabolomic datasets occur widely and can originate from a number of sources, including for both technical and biological reasons. Currently, little is known about these data, i.e. about their distributions across datasets, the need (or not) to consider them in the data processing pipeline, and most importantly, the optimal way of assigning them values prior to univariate or multivariate data analysis. Here, we address all of these issues using direct infusion Fourier transform ion cyclotron resonance mass spectrometry data. We have shown that missing data are widespread, accounting for ca. 20% of data and affecting up to 80% of all variables, and that they do not occur randomly but rather as a function of signal intensity and mass-to-charge ratio. We have demonstrated that missing data estimation algorithms have a major effect on the outcome of data analysis when comparing the differences between biological sample groups, including by t test, ANOVA and principal component analysis. Furthermore, results varied significantly across the eight algorithms that we assessed for their ability to impute known, but labelled as missing, entries. Based on all of our findings we identified the k-nearest neighbour imputation method (KNN) as the optimal missing value estimation approach for our direct infusion mass spectrometry datasets. However, we believe the wider significance of this study is that it highlights the importance of missing metabolite levels in the data processing pipeline and offers an approach to identify optimal ways of treating missing data in metabolomics experiments.
引用
收藏
页码:S161 / S174
页数:14
相关论文
共 40 条
[1]
Missing values in gel-based proteomics [J].
Albrecht, Daniela ;
Kniemeyer, Olaf ;
Brakhage, Axel A. ;
Guthke, Reinhard .
PROTEOMICS, 2010, 10 (06) :1202-1211
[2]
Improving the speed of multi-way algorithms: Part I. Tucker3 [J].
Andersson, CA ;
Bro, R .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 42 (1-2) :93-103
[3]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]
Fusion of metabolomics and proteomics data for biomarkers discovery: case study on the experimental autoimmune encephalomyelitis [J].
Blanchet, Lionel ;
Smolinska, Agnieszka ;
Attali, Amos ;
Stoop, Marcel P. ;
Ampt, Kirsten A. M. ;
van Aken, Hans ;
Suidgeest, Ernst ;
Tuinstra, Tinka ;
Wijmenga, Sybren S. ;
Luider, Theo ;
Buydens, Lutgarde M. C. .
BMC BIOINFORMATICS, 2011, 12
[5]
Statistical strategies for avoiding false discoveries in metabolomics and related experiments [J].
Broadhurst, David I. ;
Kell, Douglas B. .
METABOLOMICS, 2006, 2 (04) :171-196
[6]
Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering [J].
de Brevern, AG ;
Hazout, S ;
Malpertuy, A .
BMC BIOINFORMATICS, 2004, 5 (1)
[7]
Gene expression profiling of human liver transplants identifies an early transcriptional signature associated with initial poor graft function [J].
Defamie, V. ;
Cursio, R. ;
Le Brigand, K. ;
Moreilhon, C. ;
Saint-Paul, M. -C. ;
Laurens, M. ;
Crenesse, D. ;
Cardinaud, B. ;
Auberger, P. ;
Gugenheim, J. ;
Barbry, P. ;
Mari, B. .
AMERICAN JOURNAL OF TRANSPLANTATION, 2008, 8 (06) :1221-1236
[8]
Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures.: Application in 1H NMR metabonomics [J].
Dieterle, Frank ;
Ross, Alfred ;
Schlotterbeck, Gotz ;
Senn, Hans .
ANALYTICAL CHEMISTRY, 2006, 78 (13) :4281-4290
[9]
Metabolomics by numbers: acquiring and understanding global metabolite data [J].
Goodacre, R ;
Vaidyanathan, S ;
Dunn, WB ;
Harrigan, GG ;
Kell, DB .
TRENDS IN BIOTECHNOLOGY, 2004, 22 (05) :245-252
[10]
Application of Metabolomics to Investigate the Process of Human Orthotopic Liver Transplantation: A Proof-of-Principle Study [J].
Hrydziuszko, Olga ;
Silva, Michael A. ;
Perera, M. Thamara P. R. ;
Richards, Douglas A. ;
Murphy, Nick ;
Mirza, Darius ;
Viant, Mark R. .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2010, 14 (02) :143-150