Nonoverlapping clusters: Approximate distribution and application to molecular biology

被引:8
作者
Su, XP
Wallenstein, S
Bishop, D
机构
[1] CUNY Mt Sinai Sch Med, Dept Biomath Sci, New York, NY 10029 USA
[2] CUNY Mt Sinai Sch Med, Dept Human Genet, New York, NY 10029 USA
关键词
clustering; disease outbreaks; DNA sequence analysis; erythroid; promoters; regulatory regions; scan statistic; space-time clustering; transcription factors;
D O I
10.1111/j.0006-341X.2001.00420.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
An approach is developed for the screening of genomic sequence data to identify gene regulatory regions. This approach is based on deciding if putative transcription factor binding sites are clustered together to a greater extent than one would expect by chance. Given n events occurring on an interval of width L (L base pairs), an r:w cluster is defined as r + 1 consecutive events all contained within a window of length wL. Accurate and easily computable approximations are derived for the distribution of the number of nonoverlapping r:w clusters under the model that the positions of the n events have a uniform distribution. Simulations demonstrate that these approximations have greater accuracy than existing methods. The approximation is applied to detect erythroid-specific regulatory regions in genomic DNA sequences, first in an artificial case where r is specified a priori and then as part of an exploratory approach.
引用
收藏
页码:420 / 426
页数:7
相关论文
共 22 条
[11]   CHANCE AND STATISTICAL SIGNIFICANCE IN PROTEIN AND DNA-SEQUENCE ANALYSIS [J].
KARLIN, S ;
BRENDEL, V .
SCIENCE, 1992, 257 (5066) :39-49
[12]   SOME STATISTICAL PROBLEMS IN THE ASSESSMENT OF INHOMOGENEITIES OF DNA-SEQUENCE DATA [J].
KARLIN, S ;
MACKEN, C .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1991, 86 (413) :27-35
[13]  
LEUNG MY, 1994, NONLINEAR WORLD, V1, P445
[14]  
MERIKA M, 1995, MOL CELL BIOL, V15, P2437
[15]   APPROXIMATIONS FOR DISTRIBUTIONS OF SCAN STATISTICS [J].
NAUS, JI .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1982, 77 (377) :177-183
[16]   GenomeInspector: Basic software tools for analysis of spatial correlations between genomic structures within megabase sequences [J].
Quandt, K ;
Grote, K ;
Werner, T .
GENOMICS, 1996, 33 (02) :301-304
[17]   New approximations for the distribution of the r-scan statistic [J].
Su, XP ;
Wallenstein, S .
STATISTICS & PROBABILITY LETTERS, 2000, 46 (04) :411-419
[18]   Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes [J].
Wagner, A .
BIOINFORMATICS, 1999, 15 (10) :776-784
[19]   Identification of regulatory regions which confer muscle-specific gene expression [J].
Wasserman, WW ;
Fickett, JW .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 278 (01) :167-181
[20]  
WATERMAN MS, 1995, INTRO COMPUTATIONAL