A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion

被引:299
作者
Boccard, Julien [1 ]
Rutledge, Douglas N. [1 ]
机构
[1] AgroParisTech, Chim Analyt Lab, F-75231 Paris, France
关键词
Omics; Metabolomics; Data fusion; Multiblock; Consensus model; OPLS-DA; KERNEL ALGORITHM; CELL-DEATH; PLS-DA; IDENTIFICATION; VISUALIZATION; ARABIDOPSIS; PROJECTIONS; VARIABLES;
D O I
10.1016/j.aca.2013.01.022
中图分类号
O65 [分析化学];
学科分类号
070302 [分析化学];
摘要
Omics approaches have proven their value to provide a broad monitoring of biological systems. However, as no single analytical technique is sufficient to reveal the full biochemical content of complex biological matrices or biofluids, the fusion of information from several data sources has become a decisive issue. Omics studies generate an increasing amount of massive data obtained from different analytical devices. These data are usually high dimensional and extracting knowledge from these multiple blocks is challenging. Appropriate tools are therefore needed to handle these datasets suitably. For that purpose, a generic methodology is proposed by combining the strengths of established data analysis strategies, i.e. multiple kernel learning and OPLS-DA to offer an efficient tool for the fusion of Omics data obtained from multiple sources. Three real case studies are proposed to assess the potential of the method. A first example illustrates the fusion of mass spectrometry-based metabolomic data acquired in both negative and positive electrospray ionisation modes, from leaf samples of the model plant Arabidopsis thaliana. A second dataset involves the classification of wine grape varieties based on polyphenolic extracts analysed by two-dimensional heteronuclear magnetic resonance spectroscopy. A third case study underlines the ability of the method to combine heterogeneous data from systems biology with the analysis of publicly available data related to NCI-60 cancer cell lines from different tissue origins, which include metabolomics, transcriptomics and proteomics. The fusion of Omics data from different sources is expected to provide a more complete view of biological systems. The proposed method was demonstrated as a relevant and widely applicable alternative to handle efficiently the inherent characteristics of multiple Omics data, such as very large numbers of noisy collinear variables. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:30 / 39
页数:10
相关论文
共 42 条
[1]
Oxylipin profiling of the hypersensitive response in Arabidopsis thaliana -: Formation of a novel oxo-phytodienoic acid-containing galactolipid, arabidopside E [J].
Andersson, Mats X. ;
Hamberg, Mats ;
Kourtchenko, Olga ;
Brunnstrom, Asa ;
McPhail, Kerry L. ;
Gerwick, William H. ;
Goebel, Cornelia ;
Feussner, Ivo ;
Ellerstrom, Mats .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2006, 281 (42) :31528-31537
[2]
[Anonymous], P 21 INT C MACH LEAR
[3]
Standard machine learning algorithms applied to UPLC-TOF/MS metabolic fingerprinting for the discovery of wound biomarkers in Arabidopsis thaliana [J].
Boccard, Julien ;
Kalousis, Alexandros ;
Hilario, Melanie ;
Lanteri, Pierre ;
Hanafi, Mohamed ;
Mazerolles, Gerard ;
Wolfender, Jean-Luc ;
Carrupt, Pierre-Alain ;
Rudaz, Serge .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 104 (01) :20-27
[4]
Data integration in plant biology:: the O2PLS method for combined modeling of transcript and metabolite data [J].
Bylesjo, Max ;
Eriksson, Daniel ;
Kusano, Miyako ;
Moritz, Thomas ;
Trygg, Johan .
PLANT JOURNAL, 2007, 52 (06) :1181-1191
[5]
OPLS discriminant analysis:: combining the strengths of PLS-DA and SIMCA classification [J].
Bylesjo, Max ;
Rantalainen, Mattias ;
Cloarec, Olivier ;
Nicholson, Jeremy K. ;
Holmes, Elaine ;
Trygg, Johan .
JOURNAL OF CHEMOMETRICS, 2006, 20 (8-10) :341-351
[6]
K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space [J].
Bylesjo, Max ;
Rantalainen, Mattias ;
Nicholson, Jeremy K. ;
Holmes, Elaine ;
Trygg, Johan .
BMC BIOINFORMATICS, 2008, 9 (1)
[7]
On the increase of predictive performance with high-level data fusion [J].
Doeswijk, T. G. ;
Smilde, A. K. ;
Hageman, J. A. ;
Westerhuis, J. A. ;
van Eeuwijk, F. A. .
ANALYTICA CHIMICA ACTA, 2011, 705 (1-2) :41-47
[8]
Family Business: Multiple Members of Major Phytohormone Classes Orchestrate Plant Stress Responses [J].
Erb, Matthias ;
Glauser, Gaetan .
CHEMISTRY-A EUROPEAN JOURNAL, 2010, 16 (34) :10280-10289
[9]
Separating Y-predictive and Y-orthogonal variation in multi-block spectral data [J].
Eriksson, Lennart ;
Toft, Marianne ;
Johansson, Erik ;
Wold, Svante ;
Trygg, Johan .
JOURNAL OF CHEMOMETRICS, 2006, 20 (8-10) :352-361
[10]
Spatial and temporal dynamics of jasmonate synthesis and accumulation in Arabidopsis in response to wounding [J].
Glauser, Gaetan ;
Grata, Elia ;
Dubugnon, Lucie ;
Rudaz, Serge ;
Farmer, Edward E. ;
Wolfender, Jean-Luc .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2008, 283 (24) :16400-16407