Several machine learning algorithms have recently been applied to modeling the specificity of HIV-1 protease. The problem is challenging because of the three issues as follows: (1) datasets with high dimensionality and small number of samples could misguide classification modeling and its interpretation; (2) symbolic interpretation is desirable because it provides us insight to the specificity in the form of human-understandable rules, and thus helps us to design effective HIV inhibitors; (3) the interpretation should take into account complexity or dependency between positions in sequences. Therefore, it is neccessary to investigate multivariate and feature-selective methods to model the specificity and to extract rules from the model. We have tested extensively various machine learning methods, and we have found that the combination of neural networks and decompositional approach can generate a set of effective rules. By validation to experimental results for the HIV-1 protease, the specificity rules outperform the ones generated by frequency-based, univariate or black-box methods. (C) 2007 Elsevier Ltd. All rights reserved.
机构:Univ Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USA
Chen, LM
;
Perlina, A
论文数: 0引用数: 0
h-index: 0
机构:Univ Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USA
Perlina, A
;
Lee, CJ
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USAUniv Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USA
机构:Univ Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USA
Chen, LM
;
Perlina, A
论文数: 0引用数: 0
h-index: 0
机构:Univ Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USA
Perlina, A
;
Lee, CJ
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USAUniv Calif Los Angeles, Ctr Genom & Proteom, Inst Mol Biol, Dept Chem & Biochem, Los Angeles, CA 90095 USA