Application of SIMPLISMA purity function for variable selection in multivariate regression analysis: A case study of protein secondary structure determination from infrared spectra

被引:18
作者
Bogomolov, Andrey
Hachey, Michel
机构
[1] Adv Chem Dept Inc, Toronto, ON M5C 1T4, Canada
[2] Eueopean Mol Biol Lab, Hamburg Outstn, D-22603 Hamburg, Germany
关键词
variable selection; PLS; SIMPLISMA; purity function; protein secondary structure;
D O I
10.1016/j.chemolab.2006.07.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel approach for the pre-selection of wavelengths, to be used in combination with Partial Least Squares (PLS) or other multivariate regression techniques, is presented. This variable selection method makes use of the purity function, originally suggested in the SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) algorithm, to map up the regions of potentially influential variables. The selected intervals are then individually tested in practical modeling and prediction, and an optimal subset of variables is obtained. The algorithm is simple and intuitive and does not rely on iterative variable searches. The method was tested on a set of infrared protein spectra in order to improve the quantitative determination of the fractions of two secondary structure elements, alpha-helices and beta-strands (beta-sheets) in the protein polypeptide chain. Comparable results to those obtained through interval PLS (iPLS), an exhaustive search-based algorithm, were achieved in this study. Our method was shown to be particularly beneficial in combination with variable weighting by their inverse standard deviation. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:132 / 142
页数:11
相关论文
共 36 条
[1]  
*ADV CHEM DEV INC, 2006, ACD UV IR MAN PROC V
[2]  
[Anonymous], PROTEIN INFRARED DAT
[3]   QUANTITATIVE STUDIES OF THE STRUCTURE OF PROTEINS IN SOLUTION BY FOURIER-TRANSFORM INFRARED-SPECTROSCOPY [J].
ARRONDO, JLR ;
MUGA, A ;
CASTRESANA, J ;
GONI, FM .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 1993, 59 (01) :23-56
[4]   THE PARTIAL LEAST-SQUARES FIX POINT METHOD OF ESTIMATING INTERDEPENDENT SYSTEMS WITH LATENT-VARIABLES [J].
BOARDMAN, AE ;
HUI, BS ;
WOLD, H .
COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1981, 10 (07) :613-639
[5]  
Bogomolov A, 2005, Progress in Chemometrics Research, P119
[6]  
BOGOMOLOV A, 2006, UNPUB ACTA CRYSTAL D
[7]   Elimination of uninformative variables for multivariate calibration [J].
Centner, V ;
Massart, DL ;
deNoord, OE ;
deJong, S ;
Vandeginste, BM ;
Sterna, C .
ANALYTICAL CHEMISTRY, 1996, 68 (21) :3851-3858
[8]   PROTEIN SECONDARY STRUCTURES IN WATER FROM 2ND-DERIVATIVE AMIDE-I INFRARED-SPECTRA [J].
DONG, A ;
HUANG, P ;
CAUGHEY, WS .
BIOCHEMISTRY, 1990, 29 (13) :3303-3308
[9]  
ESBENSEN KH, 2001, MULTIVAR DATA ANAL, P75
[10]  
GUILMENT J, 1995, ANAL CHIM ACTA, V2318, P43