MotifCut: regulatory motifs finding with maximum density subgraphs

被引:96
作者
Fratkin, Eugene [1 ]
Naughton, Brian T.
Brutlag, Douglas L.
Batzoglou, Serafim
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biochem, Stanford, CA 94305 USA
关键词
FACTOR-BINDING SITES; DNA; ALGORITHM; ELEMENTS; GENES; IDENTIFICATION; COEVOLUTION; CODE;
D O I
10.1093/bioinformatics/btl243
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: DNA motif finding is one of the core problems in computational biology, for which several probabilistic and discrete approaches have been developed. Most existing methods formulate motif finding as an intractable optimization problem and rely either on expectation maximization ( EM) or on local heuristic searches. Another challenge is the choice of motif model: simpler models such as the position-specific scoring matrix (PSSM) impose biologically unrealistic assumptions such as independence of the motif positions, while more involved models are harder to parametrize and learn. Results: We present MotifCut, a graph-theoretic approach to motif finding leading to a convex optimization problem with a polynomial time solution. We build a graph where the vertices represent all k-mers in the input sequences, and edges represent pairwise k-mer similarity. In this graph, we search for a motif as the maximum density subgraph, which is a set of k-mers that exhibit a large number of pairwise similarities. Our formulation does not make strong assumptions regarding the structure of the motif and in practice both motifs that fit well the PSSM model, and those that exhibit strong dependencies between position pairs are found as dense subgraphs. We benchmark MotifCut on both synthetic and real yeast motifs, and find that it compares favorably to existing popular methods. The ability of MotifCut to detect motifs appears to scale well with increasing input size. Moreover, the motifs we discover are different from those discovered by the other methods.
引用
收藏
页码:E150 / E157
页数:8
相关论文
共 37 条
  • [11] Favorov A.V., 2004, BGRS
  • [12] Friedman N., 2003, BIOINFORMATICS, V57
  • [13] Finding functional sequence elements by multiple local alignment
    Frith, MC
    Hansen, U
    Spouge, JL
    Weng, ZP
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (01) : 189 - 200
  • [14] A FAST PARAMETRIC MAXIMUM FLOW ALGORITHM AND APPLICATIONS
    GALLO, G
    GRIGORIADIS, MD
    TARJAN, RE
    [J]. SIAM JOURNAL ON COMPUTING, 1989, 18 (01) : 30 - 55
  • [15] Conservation and evolution of cis-regulatory systems in ascomycete fungi
    Gasch, AP
    Moses, AM
    Chiang, DY
    Fraser, HB
    Berardini, M
    Eisen, MB
    [J]. PLOS BIOLOGY, 2004, 2 (12) : 2202 - 2219
  • [16] TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs
    Gordon, DB
    Nekludova, L
    McCallum, S
    Fraenkel, E
    [J]. BIOINFORMATICS, 2005, 21 (14) : 3164 - 3165
  • [17] Transcriptional regulatory code of a eukaryotic genome
    Harbison, CT
    Gordon, DB
    Lee, TI
    Rinaldi, NJ
    Macisaac, KD
    Danford, TW
    Hannett, NM
    Tagne, JB
    Reynolds, DB
    Yoo, J
    Jennings, EG
    Zeitlinger, J
    Pokholok, DK
    Kellis, M
    Rolfe, PA
    Takusagawa, KT
    Lander, ES
    Gifford, DK
    Fraenkel, E
    Young, RA
    [J]. NATURE, 2004, 431 (7004) : 99 - 104
  • [18] Identifying DNA and protein patterns with statistically significant alignments of multiple sequences
    Hertz, GZ
    Stormo, GD
    [J]. BIOINFORMATICS, 1999, 15 (7-8) : 563 - 577
  • [19] Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae
    Hughes, JD
    Estep, PW
    Tavazoie, S
    Church, GM
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 296 (05) : 1205 - 1214
  • [20] JYOTI N, 1998, BIOCHEMISTRY-US, V95, P4935