Megavariate analysis of hierarchical QSAR data

被引:44
作者
Eriksson, L
Johansson, E
Lindgren, F
Sjöström, M
Wold, S
机构
[1] Umetr AB, S-90719 Umea, Sweden
[2] Umetr AB, Malmo Off, SE-21142 Malmo, Sweden
[3] Umea Univ, Inst Chem, Umea, Sweden
关键词
PCA; PLS; hierarchical modelling; multivariate analysis; megavariate analysis; QSAR; statistical molecular design (SMD);
D O I
10.1023/A:1022450725545
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multivariate PCA- and PLS-models involving many variables are often difficult to interpret, because plots and lists of loadings, coefficients, VIPs, etc, rapidly become messy and hard to overview. There may then be a strong temptation to eliminate variables to obtain a smaller data set. Such a reduction of variables, however, often removes information and makes the modelling efforts less reliable. Model interpretation may be misleading and predictive power may deteriorate. A better alternative is usually to partition the variables into blocks of logically related variables and apply hierarchical data analysis. Such blocked data may be analyzed by PCA and PLS. This modelling forms the base-level of the hierarchical modelling set-up. On the base-level in-depth information is extracted for the different blocks. The score vectors formed on the base-level, here called 'super variables', may be linked together in new matrices on the top-level. On the top-level superficial relationships between the X- and the Y-data are investigated. In this paper the basic principles of hierarchical modelling by means of PCA and PLS are reviewed. One objective of the paper is to disseminate this concept to a broader QSAR audience. The hierarchical methods are used to analyze a set of 10 haloalkanes for which K = 30 chemical descriptors and M = 255 biological responses have been gathered. Due to the complexity of the biological data, they are sub-divided in four blocks. All the modelling steps on the base-level and the top-level are reported and the final QSAR model is interpreted thoroughly.
引用
收藏
页码:711 / 726
页数:16
相关论文
共 39 条
[1]  
[Anonymous], 1984, CHEMOMETRICS MATH ST
[2]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[3]   Alignment of flexible molecules at their receptor site using 3D descriptors and Hi-PCA [J].
Berglund, A ;
De Rosa, MC ;
Wold, S .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1997, 11 (06) :601-612
[4]  
Burnham AJ, 1996, J CHEMOMETR, V10, P31, DOI 10.1002/(SICI)1099-128X(199601)10:1<31::AID-CEM398>3.0.CO
[5]  
2-1
[6]   Latent variable multivariate regression modeling [J].
Burnham, AJ ;
MacGregor, JF ;
Viveros, R .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1999, 48 (02) :167-180
[7]   THE PROBABILITY OF CHANCE CORRELATION USING PARTIAL LEAST-SQUARES (PLS) [J].
CLARK, M ;
CRAMER, RD .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1993, 12 (02) :137-145
[8]   A STRATEGY FOR RANKING ENVIRONMENTALLY OCCURRING CHEMICALS .6. QSARS FOR THE MUTAGENIC EFFECTS OF HALOGENATED ALIPHATICS [J].
ERIKSSON, L ;
HELLBERG, S ;
JOHANSSON, E ;
JONSSON, J ;
SJOSTROM, M ;
WOLD, S ;
BERGLIND, R ;
KARLSSON, B .
ACTA CHEMICA SCANDINAVICA, 1991, 45 (09) :935-944
[9]   MODELING THE CYTOTOXICITY OF HALOGENATED ALIPHATIC-HYDROCARBONS - QUANTITATIVE STRUCTURE-ACTIVITY-RELATIONSHIPS FOR THE IC(50) TO HUMAN HELA-CELLS [J].
ERIKSSON, L ;
SANDSTROM, BE ;
SJOSTROM, M ;
TYSKLIND, M ;
WOLD, S .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1993, 12 (02) :124-131
[10]   RATIONAL RANKING OF CHEMICALS ACCORDING TO ENVIRONMENTAL RISK - AN ILLUSTRATION USING MULTIVARIATE BIOLOGICAL PROFILING OF HALOGENATED ALIPHATIC-HYDROCARBONS [J].
ERIKSSON, L ;
SJOSTROM, M ;
WOLD, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1992, 14 (1-3) :245-252