Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data

被引:81
作者
Enot, David P. [1 ]
Lin, Wanchang [1 ]
Beckmann, Manfred [1 ]
Parker, David [1 ]
Overy, David P. [1 ]
Draper, John [1 ]
机构
[1] Aberystwyth Univ, Inst Biol Sci, Aberystwyth SY23 3DA, Dyfed, Wales
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1038/nprot.2007.511
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Metabolome analysis by flow injection electrospray mass spectrometry (FIE-MS) fingerprinting generates measurements relating to large numbers of m/z signals. Such data sets often exhibit high variance with a paucity of replicates, thus providing a challenge for data mining. We describe data preprocessing and modeling methods that have proved reliable in projects involving samples from a range of organisms. The protocols interact with software resources specifically for metabolomics provided in a Web-accessible data analysis package FIEmspro (http://users.aber.ac.uk/jhd) written in the R environment and requiring a moderate knowledge of R command-line usage. Specific emphasis is placed on describing the outcome of modeling experiments using FIE-MS data that require further preprocessing to improve quality. The salient features of both poor and robust (i.e., highly generalizable) multivariate models are outlined together with advice on validating classifiers and avoiding false discovery when seeking explanatory variables.
引用
收藏
页码:446 / 470
页数:25
相关论文
共 69 条
[1]
Aharoni Asaph, 2002, OMICS A Journal of Integrative Biology, V6, P217, DOI 10.1089/15362310260256882
[2]
High-throughput classification of yeast mutants for functional genomics using metabolic footprinting [J].
Allen, J ;
Davey, HM ;
Broadhurst, D ;
Heald, JK ;
Rowland, JJ ;
Oliver, SG ;
Kell, DB .
NATURE BIOTECHNOLOGY, 2003, 21 (06) :692-696
[3]
High-throughput, nontargeted metabolite fingerprinting using nominal mass flow injection electrospray mass spectrometry [J].
Beckmann, Manfred ;
Parker, David ;
Enot, David P. ;
Duval, Emilie ;
Draper, John .
NATURE PROTOCOLS, 2008, 3 (03) :486-504
[4]
Representation, comparison, and interpretation of metabolome fingerprint data for total composition analysis and quality trait investigation in potato cultivars [J].
Beckmann, Manfred ;
Enot, David P. ;
Overy, David P. ;
Draper, John .
JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2007, 55 (09) :3444-3451
[5]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]
Avoiding model selection bias in small-sample genomic datasets [J].
Berrar, D ;
Bradbury, I ;
Dubitzky, W .
BIOINFORMATICS, 2006, 22 (10) :1245-1250
[7]
Potential of metabolomics as a functional genomics tool [J].
Bino, RJ ;
Hall, RD ;
Fiehn, O ;
Kopka, J ;
Saito, K ;
Draper, J ;
Nikolau, BJ ;
Mendes, P ;
Roessner-Tunali, U ;
Beale, MH ;
Trethewey, RN ;
Lange, BM ;
Wurtele, ES ;
Sumner, LW .
TRENDS IN PLANT SCIENCE, 2004, 9 (09) :418-425
[8]
Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[9]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]
Statistical strategies for avoiding false discoveries in metabolomics and related experiments [J].
Broadhurst, David I. ;
Kell, Douglas B. .
METABOLOMICS, 2006, 2 (04) :171-196