Predicting the binding preference of transcription factors to individual DNA k-mers

被引:23
作者
Alleyne, Trevis M. [1 ]
Pena-Castillo, Lourdes [2 ]
Badis, Gwenael [2 ]
Talukder, Shaheynoor [1 ]
Berger, Michael F. [3 ,4 ]
Gehrke, Andrew R. [3 ]
Philippakis, Anthony A. [3 ,4 ,5 ]
Bulyk, Martha L. [3 ,4 ,5 ,6 ]
Morris, Quaid D. [1 ,2 ]
Hughes, Timothy R. [1 ,2 ]
机构
[1] Univ Toronto, Dept Mol Genet, Toronto, ON M5S 3E1, Canada
[2] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON M5S 3E1, Canada
[3] Brigham & Womens Hosp, Dept Med, Div Genet, Boston, MA 02115 USA
[4] Harvard Univ, Comm Higher Degrees Biophys, Cambridge, MA 02138 USA
[5] Harvard Univ, Sch Med, Harvard Mit Div Hlth Sci & Technol, Boston, MA 02115 USA
[6] Harvard Univ, Sch Med, Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
ENGRAILED HOMEODOMAIN; SEQUENCE-RECOGNITION; SPECIFICITY; PROTEIN; CODE; RESOLUTION; HELIX;
D O I
10.1093/bioinformatics/btn645
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. Results: We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
引用
收藏
页码:1012 / 1018
页数:7
相关论文
共 27 条
[1]   DIFFERENTIAL DNA-BINDING SPECIFICITY OF THE ENGRAILED HOMEODOMAIN - THE ROLE OF RESIDUE-50 [J].
ADES, SE ;
SAUER, RT .
BIOCHEMISTRY, 1994, 33 (31) :9187-9194
[2]   Molecular evolution of the homeodomain family of transcription factors [J].
Banerjee-Basu, S ;
Baxevanis, AD .
NUCLEIC ACIDS RESEARCH, 2001, 29 (15) :3258-3269
[3]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkh121, 10.1093/nar/gkr1065, 10.1093/nar/gkp985]
[4]   Additivity in protein-DNA interactions: how good an approximation is it? [J].
Benos, PV ;
Bulyk, ML ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 2002, 30 (20) :4442-4451
[5]   Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences [J].
Berger, Michael F. ;
Badis, Gwenael ;
Gehrke, Andrew R. ;
Talukder, Shaheynoor ;
Philippakis, Anthony A. ;
Pena-Castillo, Lourdes ;
Alleyne, Trevis M. ;
Mnaimneh, Sanie ;
Botvinnik, Olga B. ;
Chan, Esther T. ;
Khalid, Faiqua ;
Zhang, Wen ;
Newburger, Daniel ;
Jaeger, Savina A. ;
Morris, Quaid D. ;
Bulyk, Martha L. ;
Hughes, Timothy R. .
CELL, 2008, 133 (07) :1266-1276
[6]   Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities [J].
Berger, Michael F. ;
Philippakis, Anthony A. ;
Qureshi, Aaron M. ;
He, Fangxue S. ;
Estep, Preston W., III ;
Bulyk, Martha L. .
NATURE BIOTECHNOLOGY, 2006, 24 (11) :1429-1435
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors [J].
Chen, Xiaoyu ;
Hughes, Timothy R. ;
Morris, Quaid .
BIOINFORMATICS, 2007, 23 (13) :I72-I79
[9]   COVARIATION OF RESIDUES IN THE HOMEODOMAIN SEQUENCE FAMILY [J].
CLARKE, ND .
PROTEIN SCIENCE, 1995, 4 (11) :2269-2278
[10]   A molecular code dictates sequence-specific DNA recognition by homeodomains [J].
Damante, G ;
Pellizzari, L ;
Esposito, G ;
Fogolari, F ;
Viglino, P ;
Fabbro, D ;
Tell, G ;
Formisano, S ;
DiLauro, R .
EMBO JOURNAL, 1996, 15 (18) :4992-5000