Sparse canonical methods for biological data integration: application to a cross-platform study

被引:204
作者
Le Cao, Kim-Anh [1 ,2 ,3 ]
Martin, Pascal G. P. [4 ]
Robert-Granie, Christele [1 ]
Besse, Philippe [2 ,3 ]
机构
[1] INRA, Stn Ameliorat Genet Anim, UR 631, F-31326 Castanet Tolosan, France
[2] Univ Toulouse, Inst Math, F-31062 Toulouse, France
[3] CNRS, UMR 5219, F-31062 Toulouse, France
[4] INRA, UR 66, Lab Pharmacol & Toxicol, F-31931 Toulouse, France
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
美国国家科学基金会;
关键词
PRINCIPAL COMPONENT ANALYSIS; LEAST-SQUARES REGRESSION; CO-INERTIA ANALYSIS; EXPRESSION; CANCER; METABOLITE; SHRINKAGE; SELECTION; LASSO;
D O I
10.1186/1471-2105-10-34
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS) to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines. Results: We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results. Conclusion: sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. They outperformed CIA that tended to select redundant information.
引用
收藏
页数:17
相关论文
共 36 条
[1]  
[Anonymous], 1976, J R Stat Soc CAppl, DOI DOI 10.2307/2347233
[2]  
[Anonymous], SPARSE PARTIAL LEAST
[3]   Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks [J].
Butte, AJ ;
Tamayo, P ;
Slonim, D ;
Golub, TR ;
Kohane, IS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (22) :12182-12186
[4]   Data integration in plant biology:: the O2PLS method for combined modeling of transcript and metabolite data [J].
Bylesjo, Max ;
Eriksson, Daniel ;
Kusano, Miyako ;
Moritz, Thomas ;
Trygg, Johan .
PLANT JOURNAL, 2007, 52 (06) :1181-1191
[5]   A network-based analysis of systemic inflammation in humans [J].
Calvano, SE ;
Xiao, WZ ;
Richards, DR ;
Felciano, RM ;
Baker, HV ;
Cho, RJ ;
Chen, RO ;
Brownstein, BH ;
Cobb, JP ;
Tschoeke, SK ;
Miller-Graziano, C ;
Moldawer, LL ;
Mindrinos, MN ;
Davis, RW ;
Tompkins, RG ;
Lowry, SF .
NATURE, 2005, 437 (7061) :1032-1037
[6]  
COMBES S, 2008, MEAT SCI IN PRESS
[7]   Cross-platform comparison and visualisation of gene expression data using co-inertia analysis -: art. no. 59 [J].
Culhane, AC ;
Perrière, G ;
Higgins, DG .
BMC BIOINFORMATICS, 2003, 4 (1)
[8]   SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[9]   CO-INERTIA ANALYSIS - AN ALTERNATIVE METHOD FOR STUDYING SPECIES ENVIRONMENT RELATIONSHIPS [J].
DOLEDEC, S ;
CHESSEL, D .
FRESHWATER BIOLOGY, 1994, 31 (03) :277-294
[10]   Gangliosides as therapeutic targets for cancer [J].
Fredman, P ;
Hedberg, K ;
Brezicka, T .
BIODRUGS, 2003, 17 (03) :155-167