Sequential projection pursuit using genetic algorithms for data mining of analytical data

被引:56
作者
Guo, Q
Wu, W
Questier, F
Massart, DL
Boucon, C
de Jong, S
机构
[1] Free Univ Brussels, Inst Pharmaceut, ChemoAC, B-1090 Brussels, Belgium
[2] Unilever Res Labs Vlaardingen, NL-3133 AT Vlaardingen, Netherlands
关键词
D O I
10.1021/ac0000123
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Sequential projection pursuit (SPP) is proposed to detect inhomogeneities (clusters) in high-dimensional analytical data. Such inhomogeneities indicate that there are groups of objects (samples) with different chemical characteristics. The method is compared with principal component analysis (PCA), PCA is generally applied to visually explore structure in high-dimensional data, but is not specifically used to find clustering tendency. Projection pursuit (PP) is specifically designed to find inhomogeneities, but the original method is computationally very intensive. SPP combines the advantages of both methods and overcomes most of their weak points. In this method, latent variables are obtained sequentially according to their importance measured by the entropy index. This involves an optimization step, which is achieved by using a genetic algorithm. The performance of the method is demonstrated and evaluated, first on simulated data sets, and then on near-infrared and gas chromatography data sets. It is shown that SPP indeed reveals more easily information about inhomogeneities than PCA.
引用
收藏
页码:2846 / 2855
页数:10
相关论文
共 34 条
[11]   WHAT IS PROJECTION PURSUIT [J].
JONES, MC ;
SIBSON, R .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1987, 150 :1-36
[12]   Detection of prediction outliers and inliers in multivariate calibration [J].
Jouan-Rimbaud, D ;
Bouveresse, E ;
Massart, DL ;
de Noord, OE .
ANALYTICA CHIMICA ACTA, 1999, 388 (03) :283-301
[13]   GENETIC ALGORITHMS AS A TOOL FOR WAVELENGTH SELECTION IN MULTIVARIATE CALIBRATION [J].
JOUANRIMBAUD, D ;
MASSART, DL ;
LEARDI, R ;
DENOORD, OE .
ANALYTICAL CHEMISTRY, 1995, 67 (23) :4295-4301
[14]   Random correlation in variable selection for multivariate calibration with a genetic algorithm [J].
JouanRimbaud, D ;
Massart, DL ;
deNoord, OE .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1996, 35 (02) :213-220
[15]   Two data sets of near infrared spectra [J].
Kalivas, JH .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1997, 37 (02) :255-259
[16]  
Karjalainen E.J., 1996, DATA ANAL HYPHENATED
[17]   OPTIMIZATION BY SIMULATED ANNEALING [J].
KIRKPATRICK, S ;
GELATT, CD ;
VECCHI, MP .
SCIENCE, 1983, 220 (4598) :671-680
[18]  
Kowalski B. R., 1984, CHEMOMETRICS MATH ST
[19]   GENETIC ALGORITHMS AS A STRATEGY FOR FEATURE-SELECTION [J].
LEARDI, R ;
BOGGIA, R ;
TERRILE, M .
JOURNAL OF CHEMOMETRICS, 1992, 6 (05) :267-281
[20]   APPLICATION OF A GENETIC ALGORITHM TO FEATURE-SELECTION UNDER FULL VALIDATION CONDITIONS AND TO OUTLIER DETECTION [J].
LEARDI, R .
JOURNAL OF CHEMOMETRICS, 1994, 8 (01) :65-79