Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

被引:1764
作者
Alipanahi, Babak [1 ,2 ]
Delong, Andrew [1 ]
Weirauch, Matthew T. [3 ,4 ,5 ,6 ,7 ]
Frey, Brendan J. [1 ,2 ,3 ,4 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON, Canada
[2] Univ Toronto, Donnelly Ctr Cellular & Biomol Res, Toronto, ON, Canada
[3] Canadian Inst Adv Res, Program Genet Networks, Toronto, ON, Canada
[4] Canadian Inst Adv Res, Program Neural Computat, Toronto, ON, Canada
[5] Cincinnati Childrens Hosp Med Ctr, Ctr Autoimmune Genom & Etiol, Cincinnati, OH 45229 USA
[6] Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
[7] Cincinnati Childrens Hosp Med Ctr, Div Dev Biol, Cincinnati, OH 45229 USA
基金
加拿大健康研究院;
关键词
TERT PROMOTER MUTATIONS; TRANSCRIPTION FACTORS; RECOGNITION; COMPLEXITY; NETWORKS; SEARCH; MODELS; CANCER; SITES;
D O I
10.1038/nbt.3300
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.
引用
收藏
页码:831 / +
页数:9
相关论文
共 51 条
  • [1] Diversity and Complexity in DNA Recognition by Transcription Factors
    Badis, Gwenael
    Berger, Michael F.
    Philippakis, Anthony A.
    Talukder, Shaheynoor
    Gehrke, Andrew R.
    Jaeger, Savina A.
    Chan, Esther T.
    Metzler, Genita
    Vedenko, Anastasia
    Chen, Xiaoyu
    Kuznetsov, Hanna
    Wang, Chi-Fong
    Coburn, David
    Newburger, Daniel E.
    Morris, Quaid
    Hughes, Timothy R.
    Bulyk, Martha L.
    [J]. SCIENCE, 2009, 324 (5935) : 1720 - 1723
  • [2] Evolutionarily Dynamic Alternative Splicing of GPR56 Regulates Regional Cerebral Cortical Patterning
    Bae, Byoung-Il
    Tietjen, Ian
    Atabay, Kutay D.
    Evrony, Gilad D.
    Johnson, Matthew B.
    Asare, Ebenezer
    Wang, Peter P.
    Murayama, Ayako Y.
    Im, Kiho
    Lisgo, Steven N.
    Overman, Lynne
    Sestan, Nenad
    Chang, Bernard S.
    Barkovich, A. James
    Grant, P. Ellen
    Topcu, Meral
    Politsky, Jeffrey
    Okano, Hideyuki
    Piao, Xianhua
    Walsh, Christopher A.
    [J]. SCIENCE, 2014, 343 (6172) : 764 - 768
  • [3] The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer
    Bell, Robert J. A.
    Rube, H. Tomas
    Kreig, Alex
    Mancini, Andrew
    Fouse, Shaun D.
    Nagarajan, Raman P.
    Choi, Serah
    Hong, Chibo
    He, Daniel
    Pekmezci, Melike
    Wiencke, John K.
    Wrensch, Margaret R.
    Chang, Susan M.
    Walsh, Kyle M.
    Myong, Sua
    Song, Jun S.
    Costello, Joseph F.
    [J]. SCIENCE, 2015, 348 (6238) : 1036 - 1039
  • [4] Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities
    Berger, Michael F.
    Philippakis, Anthony A.
    Qureshi, Aaron M.
    He, Fangxue S.
    Estep, Preston W., III
    Bulyk, Martha L.
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (11) : 1429 - 1435
  • [5] Bergstra J, 2012, J MACH LEARN RES, V13, P281
  • [6] RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors
    Chen, Xiaoyu
    Hughes, Timothy R.
    Morris, Quaid
    [J]. BIOINFORMATICS, 2007, 23 (13) : I72 - I79
  • [7] Cotter A., 2011, Advances in Neural Information Processing Systems, V24
  • [8] RRM-RNA recognition: NMR or crystallography ... and new findings
    Daubner, Gerrit M.
    Clery, Antoine
    Allain, Frederic H-T
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2013, 23 (01) : 100 - 108
  • [9] Functional Analysis of LDLR Promoter and 5′ UTR Mutations in Subjects with Clinical Diagnosis of Familial Hypercholesterolemia
    De Castro-Oros, Isabel
    Pampin, Sandra
    Bolado-Carrancio, Alfonso
    De Cubas, Aguirre
    Palacios, Lourdes
    Plana, Nuria
    Puzo, Jose
    Martorell, Esperanza
    Stef, Marianne
    Masana, Luis
    Civeira, Fernando
    Carlos Rodriguez-Rey, Jose
    Pocovi, Miguel
    [J]. HUMAN MUTATION, 2011, 32 (08) : 868 - 872
  • [10] A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter
    De Gobbi, Marco
    Viprakisit, Vip
    Hughes, Jim R.
    Fisher, Chris
    Buckle, Veronica J.
    Ayyub, Helena
    Gibbons, Richard J.
    Vernimmen, Douglas
    Yoshinaga, Yuko
    de Jong, Pieter
    Cheng, Jan-Fang
    Rubin, Edward M.
    Wood, William G.
    Bowden, Don
    Higgs, Douglas R.
    [J]. SCIENCE, 2006, 312 (5777) : 1215 - 1217