Predicting success of oligomerized pool engineering (OPEN) for zinc finger target site sequences

被引:14
作者
Sander, Jeffry D. [1 ,2 ,3 ,4 ]
Reyon, Deepak [4 ]
Maeder, Morgan L. [1 ,2 ,5 ]
Foley, Jonathan E. [1 ,2 ]
Thibodeau-Beganny, Stacey [1 ,2 ]
Li, Xiaohong [6 ]
Regan, Maureen R. [1 ,2 ]
Dahlborg, Elizabeth J. [1 ,2 ]
Goodwin, Mathew J. [1 ,2 ]
Fu, Fengli [4 ]
Voytas, Daniel F. [6 ]
Joung, J. Keith [1 ,2 ,3 ,5 ]
Dobbs, Drena [4 ]
机构
[1] Massachusetts Gen Hosp, Mol Pathol Unit, Ctr Canc Res, Charlestown, MA 02129 USA
[2] Massachusetts Gen Hosp, Ctr Computat & Integrat Biol, Charlestown, MA 02129 USA
[3] Harvard Univ, Sch Med, Dept Pathol, Boston, MA 02115 USA
[4] Iowa State Univ, Dept Genet Dev & Cell Biol, Interdept Grad Program Bioinformat & Computat Bio, Ames, IA 50011 USA
[5] Harvard Univ, Sch Med, Biol & Biomed Sci Program, Boston, MA 02115 USA
[6] Univ Minnesota, Ctr Genome Engn, Dept Genet Cell Biol & Dev, Minneapolis, MN 55455 USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
DNA-BINDING; GENE-THERAPY; NUCLEASES; PROTEINS; DESIGN; CELLS; CLASSIFICATION; CONSTRUCTION; RECOGNITION; ALGORITHMS;
D O I
10.1186/1471-2105-11-543
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Precise and efficient methods for gene targeting are critical for detailed functional analysis of genomes and regulatory networks and for potentially improving the efficacy and safety of gene therapies. Oligomerized Pool ENgineering (OPEN) is a recently developed method for engineering C2H2 zinc finger proteins (ZFPs) designed to bind specific DNA sequences with high affinity and specificity in vivo. Because generation of ZFPs using OPEN requires considerable effort, a computational method for identifying the sites in any given gene that are most likely to be successfully targeted by this method is desirable. Results: Analysis of the base composition of experimentally validated ZFP target sites identified important constraints on the DNA sequence space that can be effectively targeted using OPEN. Using alternate encodings to represent ZFP target sites, we implemented Naive Bayes and Support Vector Machine classifiers capable of distinguishing "active" targets, i.e., ZFP binding sites that can be targeted with a high rate of success, from those that are "inactive" or poor targets for ZFPs generated using current OPEN technologies. When evaluated using leave-one-out cross-validation on a dataset of 135 experimentally validated ZFP target sites, the best Naive Bayes classifier, designated ZiFOpT, achieved overall accuracy of 87% and specificity(+) of 90%, with an ROC AUC of 0.89. When challenged with a completely independent test set of 140 newly validated ZFP target sites, ZiFOpT performance was comparable in terms of overall accuracy (88%) and specificity(+) (92%), but with reduced ROC AUC (0.77). Users can rank potentially active ZFP target sites using a confidence score derived from the posterior probability returned by ZiFOpT. Conclusion: ZiFOpT, a machine learning classifier trained to identify DNA sequences amenable for targeting by OPEN-generated zinc finger arrays, can guide users to target sites that are most likely to function successfully in vivo, substantially reducing the experimental effort required. ZiFOpT is freely available and incorporated in the Zinc Finger Targeter web server (http://bindr.gdcb.iastate.edu/ZiFiT).
引用
收藏
页数:11
相关论文
共 50 条
[1]   Human zinc fingers as building blocks in the construction of artificial transcription factors [J].
Bae, KH ;
Do Kwon, Y ;
Shin, HC ;
Hwang, MS ;
Ryu, EH ;
Park, KS ;
Yang, HY ;
Lee, D ;
Lee, Y ;
Park, J ;
Kwon, HS ;
Kim, HW ;
Yeh, BI ;
Lee, HW ;
Sohn, SH ;
Yoon, J ;
Seol, W ;
Kim, JS .
NATURE BIOTECHNOLOGY, 2003, 21 (03) :275-280
[2]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[3]   Probabilistic code for DNA recognition by proteins of the EGR family [J].
Benos, PV ;
Lapedes, AS ;
Stormo, GD .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 323 (04) :701-727
[4]   Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences [J].
Berger, Michael F. ;
Badis, Gwenael ;
Gehrke, Andrew R. ;
Talukder, Shaheynoor ;
Philippakis, Anthony A. ;
Pena-Castillo, Lourdes ;
Alleyne, Trevis M. ;
Mnaimneh, Sanie ;
Botvinnik, Olga B. ;
Chan, Esther T. ;
Khalid, Faiqua ;
Zhang, Wen ;
Newburger, Daniel ;
Jaeger, Savina A. ;
Morris, Quaid D. ;
Bulyk, Martha L. ;
Hughes, Timothy R. .
CELL, 2008, 133 (07) :1266-1276
[5]   Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities [J].
Berger, Michael F. ;
Philippakis, Anthony A. ;
Qureshi, Aaron M. ;
He, Fangxue S. ;
Estep, Preston W., III ;
Bulyk, Martha L. .
NATURE BIOTECHNOLOGY, 2006, 24 (11) :1429-1435
[6]   Efficient gene targeting in Drosophila with zinc-finger nucleases [J].
Beumer, K ;
Bhattacharyya, G ;
Bibikova, M ;
Trautman, JK ;
Carroll, D .
GENETICS, 2006, 172 (04) :2391-2403
[7]   Designing transcription factor architectures for drug discovery [J].
Blancafort, P ;
Segal, DJ ;
Barbas, CF .
MOLECULAR PHARMACOLOGY, 2004, 66 (06) :1361-1371
[8]  
BUNTINE W, 1991, P 7 C 1991 UNC ART I
[9]   Predicting functionally important residues from sequence conservation [J].
Capra, John A. ;
Singh, Mona .
BIOINFORMATICS, 2007, 23 (15) :1875-1882
[10]   Progress and prospects: Zinc-finger nucleases as gene therapy agents [J].
Carroll, D. .
GENE THERAPY, 2008, 15 (22) :1463-1468