Characterization of 1H NMR spectroscopic data and the generation of synthetic validation sets

被引:8
作者
Anderson, Paul E. [1 ]
Raymer, Michael L. [1 ]
Kelly, Benjamin J. [1 ]
Reo, Nicholas V. [2 ]
DelRaso, Nicholas J. [3 ]
Doom, T. E. [1 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Dayton, OH 45435 USA
[2] Boonshoft Sch Med, Cox Inst, Dept Biochem & Mol Biol, Dayton, OH 45429 USA
[3] USAF, Res Lab, Biosci & Protect Div, Wright Patterson AFB, OH 45433 USA
关键词
MAGNETIC-RESONANCE; TOXICITY CLASSIFICATION; METABONOMIC APPROACH; PATTERN-RECOGNITION; METABOLOMICS DATA; CLUSTER-ANALYSIS; PEAK ALIGNMENT; NMR; MS; TRANSFORM;
D O I
10.1093/bioinformatics/btp540
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Common contemporary practice within the nuclear magnetic resonance (NMR) metabolomics community is to evaluate and validate novel algorithms on empirical data or simplified simulated data. Empirical data captures the complex characteristics of experimental data, but the optimal or most correct analysis is unknown a priori; therefore, researchers are forced to rely on indirect performance metrics, which are of limited value. In order to achieve fair and complete analysis of competing techniques more exacting metrics are required. Thus, metabolomics researchers often evaluate their algorithms on simplified simulated data with a known answer. Unfortunately, the conclusions obtained on simulated data are only of value if the data sets are complex enough for results to generalize to true experimental data. Ideally, synthetic data should be indistinguishable from empirical data, yet retain a known best analysis. Results: We have developed a technique for creating realistic synthetic metabolomics validation sets based on NMR spectroscopic data. The validation sets are developed by characterizing the salient distributions in sets of empirical spectroscopic data. Using this technique, several validation sets are constructed with a variety of characteristics present in 'real' data. A case study is then presented to compare the relative accuracy of several alignment algorithms using the increased precision afforded by these synthetic data sets.
引用
收藏
页码:2992 / 3000
页数:9
相关论文
共 41 条
[31]  
STEVENS MA, 1979, BIOMETRIKA, V66, P591
[32]  
STEVENS MA, 1977, BIOMETRIKA, V64, P583
[33]  
STEVENS MA, 1976, ANN STAT, V4, P357
[34]   NMR spectral quantitation by principal component analysis - III. A generalized procedure for determination of lineshape variations [J].
Stoyanova, R ;
Brown, TR .
JOURNAL OF MAGNETIC RESONANCE, 2002, 154 (02) :163-175
[35]   Identification and quantification of catecholamines in potato plants (Solarium tuberosum) by GC-MS [J].
Szopa, J ;
Wilczynski, G ;
Fiehn, O ;
Wenczel, A ;
Willmitzer, L .
PHYTOCHEMISTRY, 2001, 58 (02) :315-320
[36]   Peak alignment using reduced set mapping [J].
Torgrip, RJO ;
Åberg, M ;
Karlberg, B ;
Jacobsson, SP .
JOURNAL OF CHEMOMETRICS, 2003, 17 (11) :573-582
[37]   Centering, scaling, and transformations: improving the biological information content of metabolomics data [J].
van den Berg, Robert A. ;
Hoefsloot, Huub C. J. ;
Westerhuis, Johan A. ;
Smilde, Age K. ;
van der Werf, Mariet J. .
BMC GENOMICS, 2006, 7 (1)
[38]   A study of spectral integration and normalization in NMR-based metabonomic analyses [J].
Webb-Robertson, BJM ;
Lowry, DF ;
Jarman, KH ;
Harbo, SJ ;
Meng, QR ;
Fuciarelli, AF ;
Pounds, JG ;
Lee, KM .
JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2005, 39 (3-4) :830-836
[39]   HPLC-MS-based methods for the study of metabonomics [J].
Wilson, ID ;
Plumb, R ;
Granger, J ;
Major, H ;
Williams, R ;
Lenz, EA .
JOURNAL OF CHROMATOGRAPHY B-ANALYTICAL TECHNOLOGIES IN THE BIOMEDICAL AND LIFE SCIENCES, 2005, 817 (01) :67-76
[40]   Application of fast Fourier transform cross-correlation for the alignment of large chromatographic and spectral datasets [J].
Wong, JWH ;
Durante, C ;
Cartwright, HM .
ANALYTICAL CHEMISTRY, 2005, 77 (17) :5655-5661