Megavariate analysis of environmental QSAR data. Part I - A basic framework founded on principal component analysis (PCA), partial least squares (PLS), and statistical molecular design (SMD)

被引:134
作者
Eriksson, Lennart
Andersson, Patrik L.
Johansson, Erik
Tysklind, Mats
机构
[1] Umetrics AB, S-90719 Umea, Sweden
[2] Umea Univ, Dept Chem, Inst Environm Chem, S-90187 Umea, Sweden
关键词
megavariate data analysis; PCA; PLS; SMD; QSAR;
D O I
10.1007/s11030-006-9024-6
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
This paper introduces principal component analysis (PCA), partial least squares projections to latent structures (PLS), and statistical molecular design (SMD) as useful tools in deriving multi- and megavariate quantitative structure-activity relationship (QSAR) models. Two QSAR data sets from the fields of environmental toxicology and environmental chemistry are worked out in detail, showing the benefits of PCA, PLS and SMD. PCA is useful when overviewing a data set and exploring relationships among compounds and relationships among variables. PLS is the regression extension of PCA and is used for establishing QSARs. SMD is essential for selecting informative training and test sets of compounds for QSAR calibration and validation.
引用
收藏
页码:169 / 186
页数:18
相关论文
共 54 条
[1]
The internal barriers of rotation for the 209 polychlorinated biphenyls [J].
Andersson, PL ;
Haglund, P ;
Tysklind, M .
ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 1997, 4 (02) :75-81
[2]
Bioaccumulation of selected PCBs in zebrafish, three-spined stickleback, and arctic char after three different routes of exposure [J].
Andersson P.L. ;
Berg A.H. ;
Bjerselius R. ;
Norrgren L. ;
Olsén H. ;
Olsson P.-E. ;
Örn S. ;
Tysklind M. .
Archives of Environmental Contamination and Toxicology, 2001, 40 (4) :519-530
[3]
Ultraviolet absorption spectra of all 209 polychlorinated biphenyls evaluated by principal component analysis [J].
Andersson, PL ;
Haglund, P ;
Tysklind, M .
FRESENIUS JOURNAL OF ANALYTICAL CHEMISTRY, 1997, 357 (08) :1088-1092
[4]
Andersson PL, 2000, ENVIRON TOXICOL CHEM, V19, P1454
[5]
ANDERSSON PL, 2000, THESIS UMEA U UMEA S
[6]
D-OPTIMAL DESIGNS IN QSAR [J].
BARONI, M ;
CLEMENTI, S ;
CRUCIANI, G ;
KETTANEHWOLD, N ;
WOLD, S .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1993, 12 (03) :225-231
[7]
NTR calibration in non-linear systems:: different PLS approaches and artificial neural networks [J].
Blanco, M ;
Coello, J ;
Iturriaga, H ;
Maspoch, S ;
Pagès, J .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2000, 50 (01) :75-82
[8]
Box GEP., 1978, Statistics for experimenters
[9]
Carlson R, 2005, DATA HANDL SCI TECHN, V24, P1
[10]
D-optimal designs [J].
deAguiar, PF ;
Bourguignon, B ;
Khots, MS ;
Massart, DL ;
PhanThanLuu, R .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1995, 30 (02) :199-210