Statistical detection of cooperative transcription factors with similarity adjustment

被引:7
作者
Pape, Utz J. [1 ,2 ]
Klein, Holger [1 ]
Vingron, Martin [1 ]
机构
[1] Max Planck Inst Mol Genet, Computat Mol Biol, D-14195 Berlin, Germany
[2] Free Univ Berlin, D-14195 Berlin, Germany
关键词
CIS-REGULATORY MODULES; BINDING-SITES; TARGET GENES; COMPUTATIONAL IDENTIFICATION; POISSON APPROXIMATION; CLUSTERS; MOTIFS; REGIONS; TRANSFAC(R); IDENTIFY;
D O I
10.1093/bioinformatics/btp143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Statistical assessment of cis-regulatory modules (CRMs) is a crucial task in computational biology. Usually, one concludes from exceptional co-occurrences of DNA motifs that the corresponding transcription factors (TFs) are cooperative. However, similar DNA motifs tend to co-occur in random sequences due to high probability of overlapping occurrences. Therefore, it is important to consider similarity of DNA motifs in the statistical assessment. Results: Based on previous work, we propose to adjust the window size for co-occurrence detection. Using the derived approximation, one obtains different window sizes for different sets of DNA motifs depending on their similarities. This ensures that the probability of co-occurrences in random sequences are equal. Applying the approach to selected similar and dissimilar DNA motifs from human TFs shows the necessity of adjustment and confirms the accuracy of the approximation by comparison to simulated data. Furthermore, it becomes clear that approaches ignoring similarities strongly underestimate P-values for cooperativity of TFs with similar DNA motifs. In addition, the approach is extended to deal with overlapping windows. We derive Chen-Stein error bounds for the approximation. Comparing the error bounds for similar and dissimilar DNA motifs shows that the approximation for similar DNA motifs yields large bounds. Hence, one has to be careful using overlapping windows. Based on the error bounds, one can precompute the approximation errors and select an appropriate overlap scheme before running the analysis.
引用
收藏
页码:2103 / 2109
页数:7
相关论文
共 53 条
[1]   Computational detection of cis-regulatory modules [J].
Aerts, Stein ;
Van Loo, Peter ;
Thijs, Gert ;
Moreau, Yves ;
De Moor, Bart .
BIOINFORMATICS, 2003, 19 :II5-II14
[2]   EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[3]  
Arnone MI, 1997, DEVELOPMENT, V124, P1851
[4]  
Arratia R., 1990, Statistical Science, V5, P403, DOI [10.1214/ss/1177012015, DOI 10.1214/SS/1177012015]
[5]   Searching for statistically significant regulatory modules [J].
Bailey, Timothy L. ;
Noble, William Stafford .
BIOINFORMATICS, 2003, 19 :II16-II25
[6]  
Barbour A.D., 1992, Poisson approximation
[7]   Computational identification of developmental enhancers:: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura -: art. no. R61 [J].
Berman, BP ;
Pfeiffer, BD ;
Laverty, TR ;
Salzberg, SL ;
Rubin, GM ;
Eisen, MB ;
Celniker, SE .
GENOME BIOLOGY, 2004, 5 (09)
[8]   Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome [J].
Berman, BP ;
Nibu, Y ;
Pfeiffer, BD ;
Tomancak, P ;
Celniker, SE ;
Levine, M ;
Rubin, GM ;
Eisen, MB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (02) :757-762
[9]  
BLESER PD, 2007, GENOME BIOL, V8, pR83
[10]   Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules [J].
Boeva, Valentina ;
Clement, Julien ;
Regnier, Mireille ;
Roytberg, Mikhail A. ;
Makeev, Vsevolod J. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2007, 2 (1)