Directed molecular evolution by machine learning and the influence of nonlinear interactions

被引:45
作者
Fox, R [1 ]
机构
[1] Codexis Inc, Redwood City, CA 94063 USA
关键词
directed evolution; genetic algorithm; DNA shuffling; NK landscape; machine learning;
D O I
10.1016/j.jtbi.2004.11.031
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alternative search strategies for the directed evolution of proteins are presented and compared with each other. In particular, two different machine learning strategies based on partial least-squares regression are developed: the first contains only linear terms that represent a given residue's independent contribution to fitness, the second contains additional nonlinear terms to account for potential epistatic coupling between residues. The nonlinear modeling strategy is further divided into two types, one that contains all possible nonlinear terms and another that makes use of a genetic algorithm to select a subset of important interaction terms. The performance of each modeling type as a function of training set size is analysed. Simulated molecular evolution on a synthetic protein landscape shows the use of machine learning techniques to guide library design can be a powerful addition to library generation methods such as DNA shuffling. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:187 / 199
页数:13
相关论文
共 50 条
[21]   GA strategy for variable selection in QSAR studies: Application of GA-based region selection to a 3D-QSAR study of acetylcholinesterase inhibitors [J].
Hasegawa, K ;
Kimura, T ;
Funatsu, K .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01) :112-120
[22]   GA strategy for variable selection in QSAR studies: GA-based PLS analysis of calcium channel antagonists [J].
Hasegawa, K ;
Miyashita, Y ;
Funatsu, K .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (02) :306-310
[23]  
Hastie T., 2003, The Elements of Statistical Learning: Data Mining, Inference, and Prediction
[24]  
Kauffman S., 1993, The Origins of Order
[25]  
Kubinyi H, 1996, J CHEMOMETR, V10, P119
[26]   QSAR and 3D QSAR in drug design .1. methodology [J].
Kubinyi, H .
DRUG DISCOVERY TODAY, 1997, 2 (11) :457-467
[27]   QSAR and 3D QSAR in drug design .2. Applications and problems [J].
Kubinyi, H .
DRUG DISCOVERY TODAY, 1997, 2 (12) :538-546
[28]   Advances in directed protein evolution by recursive genetic recombination: applications to therapeutic proteins [J].
Kurtzman, AL ;
Govindarajan, S ;
Vahle, K ;
Jones, JT ;
Heinrichs, V ;
Patten, PA .
CURRENT OPINION IN BIOTECHNOLOGY, 2001, 12 (04) :361-370
[29]   Genetic algorithms applied to feature selection in PLS regression: how and when to use them [J].
Leardi, R ;
Gonzalez, AL .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 41 (02) :195-207
[30]   Mathematical modelling of insect neuropeptide potencies -: Are quantitatively predictive models possible? [J].
Lee, MJ ;
de Jong, S ;
Gäde, G ;
Poulos, C ;
Goldsworthy, GJ .
INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY, 2000, 30 (10) :899-907