Model population analysis for variable selection

被引:121
作者
Li, Hong-Dong [1 ]
Liang, Yi-Zeng [1 ]
Xu, Qing-Song [2 ]
Cao, Dong-Sheng [1 ]
机构
[1] Cent S Univ, Res Ctr Modernizat Tradit Chinese Med, Coll Chem & Chem Engn, Changsha 410083, Hunan, Peoples R China
[2] Cent S Univ, Sch Math Sci, Changsha 410083, Hunan, Peoples R China
关键词
model population analysis; variable selection; Monte Carlo sampling; biomarker discovery; PARTIAL LEAST-SQUARES; BIOMARKER DISCOVERY; MASS-SPECTROMETRY; MULTIVARIATE CALIBRATION; CROSS-VALIDATION; REGRESSION; CLASSIFICATION; ELIMINATION; PROFILES; MACHINE;
D O I
10.1002/cem.1300
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To build a credible model for given chemical or biological or clinical data, it may be helpful to first get somewhat better insight into the data itself before modeling and then to present the statistically stable results derived from a large number of sub-models established only on one dataset with the aid of Monte Carlo Sampling (MCS). In the present work, a concept model population analysis (MPA) is developed. Briefly, MPA could be considered as a general framework for developing new methods by statistically analyzing some interesting parameters (regression coefficients, prediction errors, etc.) of a number of sub-models. New methods are expected to be developed by making full use of the interesting parameter in a novel manner. In this work, the elements of MPA are first considered and described. Then, the applications for variable selection and model assessment are emphasized with the help of MPA. Copyright (C) 2010 John Wiley & Sons, Ltd.
引用
收藏
页码:418 / 423
页数:6
相关论文
共 31 条
[1]   GMDH-based feature ranking and selection for improved classification of medical data [J].
Abdel-Aal, RE .
JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (06) :456-468
[2]   Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation [J].
Alexandrov, Theodore ;
Decker, Jens ;
Mertens, Bart ;
Deelder, Andre M. ;
Tollenaar, Rob A. E. M. ;
Maass, Peter ;
Thiele, Herbert .
BIOINFORMATICS, 2009, 25 (05) :643-649
[3]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[4]   Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms [J].
Barnes, M ;
Freudenberg, J ;
Thompson, S ;
Aronow, B ;
Pavlidis, P .
NUCLEIC ACIDS RESEARCH, 2005, 33 (18) :5914-5923
[5]   A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra [J].
Cai, Wensheng ;
Li, Yankun ;
Shao, Xueguang .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 90 (02) :188-194
[6]   A New Strategy of Outlier Detection for QSAR/QSPR [J].
Cao, Dong-Sheng ;
Liang, Yi-Zeng ;
Xu, Qing-Song ;
Li, Hong-Dong ;
Chen, Xian .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2010, 31 (03) :592-602
[7]   Elimination of uninformative variables for multivariate calibration [J].
Centner, V ;
Massart, DL ;
deNoord, OE ;
deJong, S ;
Vandeginste, BM ;
Sterna, C .
ANALYTICAL CHEMISTRY, 1996, 68 (21) :3851-3858
[8]   Pitfalls in QSAR [J].
Cronin, MTD ;
Schultz, TW .
JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 2003, 622 (1-2) :39-51
[9]   SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[10]   Review - Mass spectrometry and protein analysis [J].
Domon, B ;
Aebersold, R .
SCIENCE, 2006, 312 (5771) :212-217