A Nearly Exhaustive Search for CpG Islands on Whole Chromosomes

被引:6
作者
Hsieh, Fushing [1 ]
Chen, Shu-Chun [2 ]
Pollard, Katherine [3 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Acad Sinica, Taipei, Taiwan
[3] Univ Calif San Francisco, San Francisco, CA 94143 USA
关键词
AIC and BIC model selection criteria; non-parametric decoding; filtering criteria; hierarchical factor segmentation; human chromosome 21; mathematical incompleteness; methylation; COMPREHENSIVE ANALYSIS; METHYLATION;
D O I
10.2202/1557-4679.1158
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram and explicitly show that filtering criteria-based definitions of CpG islands are mathematically incomplete and non-operational. These facts imply that the sliding window methods frequently fail to identify a large percentage of subsequences that meet the filtering criteria. We also demonstrate that an exhaustive search is computationally expensive. We develop the Hierarchical Factor Segmentation (HFS) algorithm, a pattern recognition technique with an adaptive model selection device to overcome the incompleteness and non-operational drawbacks, and to achieve effective computations for identifying CpG-islands. The concept of a CpG island "core" is introduced and computed using the HFS algorithm, which is independent from any specific filtering criteria. Upon such a CpG island "core," a CpG-island is constructed using a Lexis diagram. This two-step computational approach provides a nearly exhaustive search for CpG islands that can be practically implemented on whole chromosomes. In a simulation study realistically mimicking CpG-island dynamics through a Hidden Markov Model we demonstrate that this approach retains very high sensitivity and specificity, that is, very low rates of false positives and false negatives. Finally, we apply the HFS algorithm to identify CpG island cores on human chromosome 21.
引用
收藏
页数:24
相关论文
共 12 条
[1]   Promoter prediction analysis on the whole human genome [J].
Bajic, VB ;
Tan, SL ;
Suzuki, Y ;
Sugano, S .
NATURE BIOTECHNOLOGY, 2004, 22 (11) :1467-1473
[2]   CPG-RICH ISLANDS AND THE FUNCTION OF DNA METHYLATION [J].
BIRD, AP .
NATURE, 1986, 321 (6067) :209-213
[3]   CpG island mapping by epigenome prediction [J].
Bock, Christoph ;
Walter, Joern ;
Paulsen, Martina ;
Lengauer, Thomas .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (06) :1055-1070
[4]   CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure [J].
Bock, Christoph ;
Paulsen, Martina ;
Tierling, Sascha ;
Mikeska, Thomas ;
Lengauer, Thomas ;
Walter, Joern .
PLOS GENETICS, 2006, 2 (03) :243-252
[5]   Testing and mapping non-stationarity in animal behavioral processes: A case study on an individual female bean weevil [J].
Fushing, H ;
Hwang, CR ;
Lee, HC ;
Lan, YC ;
Horng, SB .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (04) :805-816
[6]   CPG ISLANDS IN VERTEBRATE GENOMES [J].
GARDINERGARDEN, M ;
FROMMER, M .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (02) :261-282
[7]   Cancer epigenetics [J].
Laird, PW .
HUMAN MOLECULAR GENETICS, 2005, 14 :R65-R76
[8]   A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters [J].
Saxonov, S ;
Berg, P ;
Brutlag, DL .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (05) :1412-1417
[9]   Comprehensive analysis of CpG islands in human chromosomes 21 and 22 [J].
Takai, D ;
Jones, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (06) :3740-3745
[10]   Innovation - Detection and interpretation of altered methylation patterns in cancer cells [J].
Ushijima, T .
NATURE REVIEWS CANCER, 2005, 5 (03) :223-231