Assessment of PLSDA cross validation

被引:1139
作者
Westerhuis, Johan A. [1 ]
Hoefsloot, Huub C. J. [1 ]
Smit, Suzanne [1 ]
Vis, Daniel J. [1 ]
Smilde, Age K. [1 ]
van Velzen, Ewoud J. J. [1 ,2 ]
van Duijnhoven, John P. M. [2 ]
van Dorsten, Ferdi A. [2 ]
机构
[1] Univ Amsterdam, Swammerdam Inst Life Sci, NL-1018 WV Amsterdam, Netherlands
[2] Unilever Food & Hlth Res Inst, NL-3133 AT Vlaardingen, Netherlands
关键词
cross model validation; permutation testing; classification; PLSDA;
D O I
10.1007/s11306-007-0099-6
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Classifying groups of individuals based on their metabolic profile is one of the main topics in metabolomics research. Due to the low number of individuals compared to the large number of variables, this is not an easy task. PLSDA is one of the data analysis methods used for the classification. Unfortunately this method eagerly overfits the data and rigorous validation is necessary. The validation however is far from straightforward. Is this paper we will discuss a strategy based on cross model validation and permutation testing to validate the classification models. It is also shown that too optimistic results are obtained when the validation is not done properly. Furthermore, we advocate against the use of PLSDA score plots for inference of class differences.
引用
收藏
页码:81 / 89
页数:9
相关论文
共 28 条
[1]   Reducing over-optimism in variable selection by cross-model validation [J].
Anderssen, Endre ;
Dyrstad, Knut ;
Westad, Frank ;
Martens, Harald .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 84 (1-2) :69-74
[2]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[3]   Potential of metabolomics as a functional genomics tool [J].
Bino, RJ ;
Hall, RD ;
Fiehn, O ;
Kopka, J ;
Saito, K ;
Draper, J ;
Nikolau, BJ ;
Mendes, P ;
Roessner-Tunali, U ;
Beale, MH ;
Trethewey, RN ;
Lange, BM ;
Wurtele, ES ;
Sumner, LW .
TRENDS IN PLANT SCIENCE, 2004, 9 (09) :418-425
[4]   NMR-based metabonomic approaches for evaluating physiological influences on biofluid composition [J].
Bollard, ME ;
Stanley, EG ;
Lindon, JC ;
Nicholson, JK ;
Holmes, E .
NMR IN BIOMEDICINE, 2005, 18 (03) :143-162
[5]  
Breiman L, 1998, ANN STAT, V26, P801
[6]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[7]   Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data [J].
Brereton, Richard G. .
TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2006, 25 (11) :1103-1111
[8]   Statistical strategies for avoiding false discoveries in metabolomics and related experiments [J].
Broadhurst, David I. ;
Kell, Douglas B. .
METABOLOMICS, 2006, 2 (04) :171-196
[9]   OPLS discriminant analysis:: combining the strengths of PLS-DA and SIMCA classification [J].
Bylesjo, Max ;
Rantalainen, Mattias ;
Cloarec, Olivier ;
Nicholson, Jeremy K. ;
Holmes, Elaine ;
Trygg, Johan .
JOURNAL OF CHEMOMETRICS, 2006, 20 (8-10) :341-351
[10]   Pharmaco-metabonomic phenotyping and personalized drug treatment [J].
Clayton, TA ;
Lindon, JC ;
Cloarec, O ;
Antti, H ;
Charuel, C ;
Hanton, G ;
Provost, JP ;
Le Net, JL ;
Baker, D ;
Walley, RJ ;
Everett, JR ;
Nicholson, JK .
NATURE, 2006, 440 (7087) :1073-1077