A method for clustering and screening of long-dimensional chemical data based on fingerprints and similarity measurements

被引:5
作者
Urbano Cuadrado, Manuel [1 ]
Cerruela Garcia, Gonzalo [1 ]
Luque Ruiz, Irene [1 ]
Angel Gomez-Nieto, Miguel [1 ]
机构
[1] Univ Cordoba, Dept Comp & Numer Anal, E-14071 Cordoba, Spain
关键词
data preparation; similarity calculation; fingerprints; clustering; screening;
D O I
10.1007/s10910-006-9118-5
中图分类号
O6 [化学];
学科分类号
0703 [化学];
摘要
A method for the treatment of long-dimensional chemical data arrays is presented in this work with the aim of maximising classification models. The method is based on the construction of fingerprints and the subsequent generation of a similarity matrix. The similarity calculation has been modified through a scaling process to take into account different significance shown by the variables. The method was applied to spectral measurements of wines and several aspects were studied, namely: threshold considered in the construction of fingerprints and patterns, weighting factor used for scaling, normalisation method, etc. The application of both Principal Components Analysis and Soft-Independent Modelling of Class Analogies to the similarity matrices gave better classifications of the information than those obtained using original data.
引用
收藏
页码:15 / 27
页数:13
相关论文
共 16 条
[1]
Efficient implementation of high dimensional model representations [J].
Alis, ÖF ;
Rabitz, H .
JOURNAL OF MATHEMATICAL CHEMISTRY, 2001, 29 (02) :127-142
[2]
The Mahalanobis distance [J].
De Maesschalck, R ;
Jouan-Rimbaud, D ;
Massart, DL .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2000, 50 (01) :1-18
[3]
Esbensen K. H., 2002, MULTIVARIATE DATA AN
[4]
Hagberg G, 1998, NMR BIOMED, V11, P148, DOI 10.1002/(SICI)1099-1492(199806/08)11:4/5<148::AID-NBM511>3.0.CO
[5]
2-4
[6]
Leardi R., 2003, NATURE INSPIRED METH
[7]
Mahalanobis PC., 1936, P NATL I SCI INDIA, V12, P49, DOI DOI 10.1007/S13171-019-00164-5
[8]
The importance of scaling in data mining for toxicity prediction [J].
Mazzatorta, P ;
Benfenati, E ;
Neagu, D ;
Gini, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (05) :1250-1255
[9]
McLachlan, 2004, DISCRIMINANT ANAL ST
[10]
ROUVRAY DH, 1979, CHEM APPL GRAPH THEO