Use of cluster separation indices and the influence of outliers: application of two new separation indices, the modified silhouette index and the overlap coefficient to simulated data and mouse urine metabolomic profiles

被引:17
作者
Dixon, Sarah J. [1 ]
Heinrich, Nina [2 ]
Holmboe, Maria [2 ]
Schaefer, Michele L. [3 ]
Reed, Randall R. [3 ]
Trevejo, Jose [2 ]
Brereton, Richard G. [1 ]
机构
[1] Univ Bristol, Sch Chem, Ctr Chemometr, Bristol BS8 1TS, Avon, England
[2] Charles Stark Draper Lab Inc, Cambridge, MA 02139 USA
[3] Johns Hopkins Univ, Sch Med, Dept Mol Biol & Genet, Baltimore, MD 21205 USA
关键词
cluster separation; Davies Bouldin index; silhouette index; outliers; metabolomics; simulations; urine; MASS-SPECTROMETRY; ROBUSTNESS; VOLATILES; TESTS;
D O I
10.1002/cem.1189
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To quantify separate classes, four indices are compared namely the Davies Bouldin index, the silhouette width and two new approaches described in this paper, the modified silhouette width index based on the proportion of objects with a positive silhouette width and the Overlap Coefficient. Four sets of simulated datasets are described, each in turn, consisting of 15 sets of data of varying degrees of overlap, and differing in the nature of outliers. Three experimental datasets consisting of the gas chromatography mass spectrometry of extracts from mouse urine obtained to study the effect of different environmental (stress), physiological (diet) and developmental (age) factors on their metabolic profiles are also described. The paper discusses the robustness of each approach to outliers, and to allow assessment of class separation for each index. The two modifications protect against outliers. Copyright (C) 2008 John Wiley & Sons, Ltd.
引用
收藏
页码:19 / 31
页数:13
相关论文
共 23 条
[1]  
Brereton R.G., 2003, DATA ANAL LAB CHEM P
[2]   ASYMPTOTICS FOR THE MINIMUM COVARIANCE DETERMINANT ESTIMATOR [J].
BUTLER, RW ;
DAVIES, PL ;
JHUN, M .
ANNALS OF STATISTICS, 1993, 21 (03) :1385-1400
[4]   Robust statistics in data analysis - A review basic concepts [J].
Daszykowski, M. ;
Kaczmarek, K. ;
Heyden, Y. Vander ;
Walczak, B. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2007, 85 (02) :203-219
[5]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[6]   Pattern recognition of gas chromatography mass spectrometry of human volatiles in sweat to distinguish the sex of subjects and determine potential discriminatory marker peaks [J].
Dixon, Sarah J. ;
Xu, Yun ;
Brereton, Richard G. ;
Soini, Helena A. ;
Novotny, Milos V. ;
Oberzaucher, Elisabeth ;
Grammer, Karl ;
Penn, Dustin J. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2007, 87 (02) :161-172
[7]   An automated method for peak detection and matching in large gas chromatography-mass spectrometry data sets [J].
Dixon, Sarah J. ;
Brereton, Richard G. ;
Soini, Helena A. ;
Novotny, Milos V. ;
Penn, Dustin J. .
JOURNAL OF CHEMOMETRICS, 2006, 20 (8-10) :325-340
[9]   PROCEDURES FOR DETECTING OUTLYING OBSERVATIONS IN SAMPLES [J].
GRUBBS, FE .
TECHNOMETRICS, 1969, 11 (01) :1-&
[10]   VARIATIONS IN MOUSE (MUS-MUSCULUS) URINARY VOLATILES DURING DIFFERENT PERIODS OF PREGNANCY AND LACTATION [J].
JEMIOLO, B ;
ANDREOLINI, F ;
WIESLER, D ;
NOVOTNY, M .
JOURNAL OF CHEMICAL ECOLOGY, 1987, 13 (09) :1941-1956