Partial least squares discriminant analysis: taking the magic away

被引:657
作者
Brereton, Richard G. [1 ]
Lloyd, Gavin R. [2 ]
机构
[1] Univ Bristol, Sch Chem, Bristol BS8 1TS, Avon, England
[2] Gloucestershire Hosp NHS Fdn Trust, Biophoton Res Unit, Gloucester GL1 3NN, England
关键词
Partial Least Squares; Discrimination; Classification; Two Class Classifiers; REGRESSION;
D O I
10.1002/cem.2609
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Partial least squares discriminant analysis (PLS-DA) has been available for nearly 20years yet is poorly understood by most users. By simple examples, it is shown graphically and algebraically that for two equal class sizes, PLS-DA using one partial least squares (PLS) component provides equivalent classification results to Euclidean distance to centroids, and by using all nonzero components to linear discriminant analysis. Extensions where there are unequal class sizes and more than two classes are discussed including common pitfalls and dilemmas. Finally, the problems of overfitting and PLS scores plots are discussed. It is concluded that for classification purposes, PLS-DA has no significant advantages over traditional procedures and is an algorithm full of dangers. It should not be viewed as a single integrated method but as step in a full classification procedure. However, despite these limitations, PLS-DA can provide good insight into the causes of discrimination via weights and loadings, which gives it a unique role in exploratory data analysis, for example in metabolomics via visualisation of significant variables such as metabolites or spectroscopic peaks. Copyright (c) 2014 John Wiley & Sons, Ltd. PLS-DA is described initially as a two-class classifier. It is shown that under certain circumstances, its performance is identical to two well-established statistical approaches, namely EDC and LDA. Its extensions when class sizes are unequal and when there are more than two groups are described as well as pitfalls when using PLS scores plots. Common difficulties are discussed, and it is recommended that PLS-DA is considered as a single algorithmic step of an overall classification strategy.
引用
收藏
页码:213 / 225
页数:13
相关论文
共 13 条
[1]
[Anonymous], 1936, P NATL I SCI INDIA, DOI DOI 10.1007/S13171-019-00164-5
[2]
[Anonymous], 2009, CHEMOMETRICS PATTERN
[3]
Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[5]
Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure [J].
Dixon, Sarah J. ;
Brereton, Richard G. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2009, 95 (01) :1-17
[6]
The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[7]
REGULARIZED DISCRIMINANT-ANALYSIS [J].
FRIEDMAN, JH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1989, 84 (405) :165-175
[8]
PARTIAL LEAST-SQUARES REGRESSION - A TUTORIAL [J].
GELADI, P ;
KOWALSKI, BR .
ANALYTICA CHIMICA ACTA, 1986, 185 :1-17
[9]
DIAGNOSIS OF DEMENTIAS USING PARTIAL LEAST-SQUARES DISCRIMINANT-ANALYSIS [J].
GOTTFRIES, J ;
BLENNOW, K ;
WALLIN, A ;
GOTTFRIES, CG .
DEMENTIA, 1995, 6 (02) :83-88
[10]
Miller K. S., 1981, Mathematics magazine, V54, P67, DOI DOI 10.1080/0025570X.1981.11976898