Scaling and normalization effects in NMR spectroscopic metabonomic data sets

被引:392
作者
Craig, A [1 ]
Cloareo, O [1 ]
Holmes, E [1 ]
Nicholson, JK [1 ]
Lindon, JC [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Fac Nat Sci, London SW7 2AZ, England
基金
英国惠康基金;
关键词
D O I
10.1021/ac0519312
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Considerable confusion appears to exist in the metabonomics literature as to the real need for, and the role of, preprocessing the acquired spectroscopic data. A number of studies have presented various data manipulation approaches, some suggesting an optimum method. In metabonomics, data are usually presented as a table where each row relates to a given sample or analytical experiment and each column corresponds to a single measurement in that experiment, typically individual spectral peak intensities or metabolite concentrations. Here we suggest definitions for and discuss the operations usually termed normalization (a table row operation) and scaling (a table column operation) and demonstrate their need in H-1 NMR spectroscopic data sets derived from urine. The problems associated with "binned" data (i.e., values integrated over discrete spectral regions) are also discussed, and the particular biological context problems of analytical data on urine are highlighted. It is shown that care must be exercised in calculation of correlation coefficients for data sets where normalization to a constant sum is used. Analogous considerations will be needed for other biofluids, other analytical approaches (e.g., HPLC-MS), and indeed for other "omics" techniques (i.e., transcriptomics or proteomics) and for integrated studies with "fused" data sets. It is concluded that data preprocessing is context dependent and there can be no single method for general use.
引用
收藏
页码:2262 / 2267
页数:6
相关论文
共 31 条
  • [1] ANTHONY ML, 1994, MOL PHARMACOL, V46, P199
  • [2] Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation
    Bijlsma, S
    Bobeldijk, L
    Verheij, ER
    Ramaker, R
    Kochhar, S
    Macdonald, IA
    van Ommen, B
    Smilde, AK
    [J]. ANALYTICAL CHEMISTRY, 2006, 78 (02) : 567 - 574
  • [3] Creatinine clearance, cockcroft-gault formula and cystatin C: Estimators of true glomerular filtration rate in the elderly?
    Burkhardt, H
    Bojarsky, G
    Gretz, N
    Gladisch, R
    [J]. GERONTOLOGY, 2002, 48 (03) : 140 - 146
  • [4] CASTLE AC, IN PRESS BRIEFINGS B
  • [5] Statistical total correlation spectroscopy:: An exploratory approach for latent biomarker identification from metabolic 1H NMR data sets
    Cloarec, O
    Dumas, ME
    Craig, A
    Barton, RH
    Trygg, J
    Hudson, J
    Blancher, C
    Gauguier, D
    Lindon, JC
    Holmes, E
    Nicholson, J
    [J]. ANALYTICAL CHEMISTRY, 2005, 77 (05) : 1282 - 1289
  • [6] Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies
    Cloarec, O
    Dumas, ME
    Trygg, J
    Craig, A
    Barton, RH
    Lindon, JC
    Nicholson, JK
    Holmes, E
    [J]. ANALYTICAL CHEMISTRY, 2005, 77 (02) : 517 - 526
  • [7] CRAIG A, 2004, THESIS U LONDON, P293
  • [8] Statistical heterospectroscopy, an approach to the integrated analysis of NMR and UPLC-MS data sets: Application in metabonomic toxicology studies
    Crockford, DJ
    Holmes, E
    Lindon, JC
    Plumb, RS
    Zirah, S
    Bruce, SJ
    Rainville, P
    Stumpf, CL
    Nicholson, JK
    [J]. ANALYTICAL CHEMISTRY, 2006, 78 (02) : 363 - 371
  • [9] Eriksson L., 2013, MULTI MEGAVARIATE DA
  • [10] Metabolomics - the link between genotypes and phenotypes
    Fiehn, O
    [J]. PLANT MOLECULAR BIOLOGY, 2002, 48 (1-2) : 155 - 171