Improved classification accuracy in 1-and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation

被引:172
作者
Parsons, Helen M.
Ludwig, Christian
Guenther, Ulrich L.
Viant, Mark R. [1 ]
机构
[1] Univ Birmingham, Ctr Syst Biol, Birmingham B15 2TT, W Midlands, England
[2] Univ Birmingham, Biomol NMR Spect, Birmingham B15 2TT, W Midlands, England
[3] Univ Birmingham, Sch Biosci, Birmingham B15 2TT, W Midlands, England
基金
英国自然环境研究理事会;
关键词
D O I
10.1186/1471-2105-8-234
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) H-1, projections of 2D H-1, H-1 J-resolved (pJRES), and intact 2D J-resolved (JRES). Results: Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. Conclusion: We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra.
引用
收藏
页数:16
相关论文
共 27 条
[1]   NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approaches [J].
Beckonert, O ;
Bollard, ME ;
Ebbels, TMD ;
Keun, HC ;
Antti, H ;
Holmes, E ;
Lindon, JC ;
Nicholson, JK .
ANALYTICA CHIMICA ACTA, 2003, 490 (1-2) :3-15
[2]   Fundamentals of experimental design for cDNA microarrays [J].
Churchill, GA .
NATURE GENETICS, 2002, 32 (Suppl 4) :490-495
[3]   Scaling and normalization effects in NMR spectroscopic metabonomic data sets [J].
Craig, A ;
Cloareo, O ;
Holmes, E ;
Nicholson, JK ;
Lindon, JC .
ANALYTICAL CHEMISTRY, 2006, 78 (07) :2262-2267
[4]   Metabonomic assessment of physiological disruptions using 1H-13C HMBC-NMR spectroscopy combined with pattern recognition procedures performed on filtered variables [J].
Dumas, ME ;
Canlet, C ;
André, F ;
Vercauteren, J ;
Paris, A .
ANALYTICAL CHEMISTRY, 2002, 74 (10) :2261-2273
[5]   Estimation of transformation parameters for microarray data [J].
Durbin, B ;
Rocke, DM .
BIOINFORMATICS, 2003, 19 (11) :1360-1367
[6]  
Durbin B P, 2002, Bioinformatics, V18 Suppl 1, pS105
[7]  
Eriksson L, 2001, MULTI AND MEGAVARIAT
[8]   Multivariate statistical analysis of two-dimensional NMR data to differentiate grapevine cultivars and clones [J].
Forveille, L ;
Vercauteren, J ;
Rutledge, DN .
FOOD CHEMISTRY, 1996, 57 (03) :441-450
[9]   Transformation and normalization of oligonucleotide microarray data [J].
Geller, SC ;
Gregg, JP ;
Hagerman, P ;
Rocke, DM .
BIOINFORMATICS, 2003, 19 (14) :1817-1823
[10]   Improved baseline recognition and modeling of FT NMR spectra [J].
Golotvin, S ;
Williams, A .
JOURNAL OF MAGNETIC RESONANCE, 2000, 146 (01) :122-125