Representative subset selection

被引:264
作者
Daszykowski, M [1 ]
Walczak, B [1 ]
Massart, DL [1 ]
机构
[1] VUB, FABI, Pharmaceut Inst ChemoAC, B-1090 Brussels, Belgium
关键词
data mining; subset selection; uniform design;
D O I
10.1016/S0003-2670(02)00651-7
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Fast development of analytical techniques enable to acquire huge amount of data. Large data sets are difficult to handle and therefore, there is a big interest in designing a subset of the original data set, which preserves the information of the original data set and facilitates the computations. There are many subset selection methods and their choice depends on the problem at hand. The two most popular groups of subset selection methods are uniform designs and cluster-based designs. Among the methods considered in this paper there are uniform designs, such as those proposed by Kennard and Stone, OptiSim, and cluster-based designs applying K-means technique and density based spatial clustering of applications with noise (DBSCAN). Additionally, a new concept of the subset selection with K-means is introduced. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:91 / 103
页数:13
相关论文
共 27 条
  • [1] Stochastic algorithms for maximizing molecular diversity
    Agrafiotis, DK
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (05): : 841 - 851
  • [2] Ankrest M., OPTICS ORDERING POIN
  • [3] [Anonymous], 1983, INTERPRETATION ANAL
  • [4] 1998, HDB CHEMOMETRICS QUA, V11, P310
  • [5] STANDARD NORMAL VARIATE TRANSFORMATION AND DE-TRENDING OF NEAR-INFRARED DIFFUSE REFLECTANCE SPECTRA
    BARNES, RJ
    DHANOA, MS
    LISTER, SJ
    [J]. APPLIED SPECTROSCOPY, 1989, 43 (05) : 772 - 777
  • [6] Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection
    Brown, RD
    Martin, YC
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03): : 572 - 584
  • [7] Identification of pharmaceutical excipients using NIR spectroscopy and SIMCA
    Candolfi, A
    De Maesschalck, R
    Massart, DL
    Hailey, PA
    Harrington, ACE
    [J]. JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 1999, 19 (06) : 923 - 935
  • [8] OptiSim: An extended dissimilarity selection method for finding diverse representative subsets
    Clark, RD
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (06): : 1181 - 1188
  • [9] Looking for natural patterns in analytical data. 2. Tracing local density with OPTICS
    Daszykowski, M
    Walczak, B
    Massart, DL
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (03): : 500 - 507
  • [10] Looking for natural patterns in data - Part 1. Density-based approach
    Daszykowski, M
    Walczak, B
    Massart, DL
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 56 (02) : 83 - 92