On the use of resampling tests for evaluating statistical significance of binding-site co-occurrence

被引:12
作者
Huen, David S. [1 ]
Russell, Steven [1 ,2 ]
机构
[1] Univ Cambridge, Dept Genet, Cambridge CB2 3EH, England
[2] Cambridge Syst Biol Ctr, Cambridge CB2 1QR, England
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
英国生物技术与生命科学研究理事会;
关键词
Permutation Test; Hybrid Method; Null Distribution; Binding Profile; Slave Process;
D O I
10.1186/1471-2105-11-359
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In eukaryotes, most DNA-binding proteins exert their action as members of large effector complexes. The presence of these complexes are revealed in high-throughput genome-wide assays by the co-occurrence of the binding sites of different complex components. Resampling tests are one route by which the statistical significance of apparent co-occurrence can be assessed. Results: We have investigated two resampling approaches for evaluating the statistical significance of binding-site co-occurrence. The permutation test approach was found to yield overly favourable p-values while the independent resampling approach had the opposite effect and is of little use in practical terms. We have developed a new, pragmatically-devised hybrid approach that, when applied to the experimental results of an Polycomb/Trithorax study, yielded p-values consistent with the findings of that study. We extended our investigations to the FL method developed by Haiminen et al, which derives its null distribution from all binding sites within a dataset, and show that the p-value computed for a pair of factors by this method can depend on which other factors are included in that dataset. Both our hybrid method and the FL method appeared to yield plausible estimates of the statistical significance of co-occurrences although our hybrid method was more conservative when applied to the Polycomb/Trithorax dataset. A high-performance parallelized implementation of the hybrid method is available. Conclusions: We propose a new resampling-based co-occurrence significance test and demonstrate that it performs as well as or better than existing methods on a large experimentally-derived dataset. We believe it can be usefully applied to data from high-throughput genome-wide techniques such as ChIP-chip or DamID. The Cooccur package, which implements our approach, accompanies this paper.
引用
收藏
页数:13
相关论文
共 13 条
[1]  
FU AQ, MOL BIOSYSTEMS
[2]  
Gabriel E, 2004, LECT NOTES COMPUT SC, V3241, P97
[3]   Determining significance of pairwise co-occurrences of events in bursty sequences [J].
Haiminen, Niina ;
Mannila, Heikki ;
Terzi, Evimaria .
BMC BIOINFORMATICS, 2008, 9 (1) :336
[4]   Predicting transcription factor synergism [J].
Hannenhalli, S ;
Levy, S .
NUCLEIC ACIDS RESEARCH, 2002, 30 (19) :4278-4284
[5]   Taspase1:: A threonine aspartase required for cleavage of MLL and proper HOX gene expression [J].
Hsieh, JJD ;
Cheng, EHY ;
Korsmeyer, SJ .
CELL, 2003, 115 (03) :293-303
[6]   Proteolytic cleavage of MLL generates a complex of N- and C-terminal fragments that confers protein stability and subnuclear localization [J].
Hsieh, JJD ;
Ernst, P ;
Erdjument-Bromage, H ;
Tempst, P ;
Korsmeyer, SJ .
MOLECULAR AND CELLULAR BIOLOGY, 2003, 23 (01) :186-194
[7]   Enrichment of regulatory signals in conserved non-coding genomic sequence [J].
Levy, S ;
Hannenhalli, S ;
Workman, C .
BIOINFORMATICS, 2001, 17 (10) :871-877
[8]  
MOHDSARIP A, 2006, P NATL ACAD SCI USA, V103, P12027
[9]   Statistical detection of cooperative transcription factors with similarity adjustment [J].
Pape, Utz J. ;
Klein, Holger ;
Vingron, Martin .
BIOINFORMATICS, 2009, 25 (16) :2103-2109
[10]  
R Core Team, 2020, R foundation for statistical computing Computer software