Prediction of protein retention times in anion-exchange chromatography systems using support vector regression

被引:134
作者
Song, MH
Breneman, CM
Bi, JB
Sukumar, N
Bennett, KP
Cramer, S
Tugcu, N
机构
[1] Rensselaer Polytech Inst, Dept Chem, Troy, NY 12180 USA
[2] Rensselaer Polytech Inst, Dept Math, Troy, NY 12180 USA
[3] Rensselaer Polytech Inst, Dept Chem Engn, Troy, NY 12180 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 06期
关键词
D O I
10.1021/ci025580t
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Quantitative Structure-Retention Relationship (QSRR) models are developed for the prediction of protein retention times in anion-exchange chromatography systems. Topological, subdivided surface area, and TAE (Transferable Atom Equivalent) electron-density-based descriptors are computed directly for a set of proteins using molecular connectivity patterns and crystal structure geometries. A novel algorithm based on Support Vector Machine (SVM) regression has been employed to obtain predictive QSRR models using a two-step computational strategy. In the first step, a sparse linear SVM was utilized as a feature selection procedure to remove irrelevant or redundant information. Subsequently, the selected features were used to produce an ensemble of nonlinear SVM regression models that were combined using bootstrap aggregation (bagging) techniques, where various combinations of training and validation data sets were selected from the pool of available data. A visualization scheme (star plots) was used to display the relative importance of each selected descriptor in the final set of "bagged" models. Once these predictive models have been validated, they can be used as an automated prediction tool for virtual high-throughput screening (VHTS).
引用
收藏
页码:1347 / 1357
页数:11
相关论文
共 52 条
[1]   Feature selection for structure-activity correlation using binary particle swarms [J].
Agrafiotis, DK ;
Cedeño, W .
JOURNAL OF MEDICINAL CHEMISTRY, 2002, 45 (05) :1098-1107
[2]  
[Anonymous], 1982, ESTIMATION DEPENDENC
[3]  
Bader F.W., 1994, Atoms in molecules: a quantum theory
[4]  
BENNETT KP, IN PRESS J MACHINE L
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]   Using iterated bagging to debias regressions [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (03) :261-277
[7]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[8]  
Breneman CM, 1997, J COMPUT CHEM, V18, P182, DOI 10.1002/(SICI)1096-987X(19970130)18:2<182::AID-JCC4>3.0.CO
[9]  
2-R
[10]   ELECTRON-DENSITY MODELING OF LARGE SYSTEMS USING THE TRANSFERABLE ATOM EQUIVALENT METHOD [J].
BRENEMAN, CM ;
THOMPSON, TR ;
RHEM, M ;
DUNG, M .
COMPUTERS & CHEMISTRY, 1995, 19 (03) :161-&