A Model-Based Approach to Study Nearest-Neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences

被引:23
作者
Baele, Guy [1 ,2 ,3 ]
Van de Peer, Yves [2 ,3 ]
Vansteelandt, Stijn [1 ]
机构
[1] Univ Ghent, Dept Comp Sci & Appl Math, B-9000 Ghent, Belgium
[2] Univ Ghent VIB, Dept Plant Syst Biol, B-9052 Ghent, Belgium
[3] Univ Ghent, Dept Mol Genet, B-9052 Ghent, Belgium
关键词
Bayes factor; context effect; context-dependent evolution; CpG effect; likelihood function; Markov chain Monte Carlo; nearest-neighbor influences; thermodynamic integration;
D O I
10.1080/10635150802422324
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this article, we present a likelihood-based framework for modeling site dependencies. Our approach builds upon standard evolutionary models but incorporates site dependencies across the entire tree by letting the evolutionary parameters in these models depend upon the ancestral states at the neighboring sites. It thus avoids the need for introducing new and high-dimensional evolutionary models for site-dependent evolution. We propose a Markov chain Monte Carlo approach with data augmentation to infer the evolutionary parameters under our model. Although our approach allows for wide-ranging site dependencies, we illustrate its use, in two non-coding datasets, in the case of nearest-neighbor dependencies (i.e., evolution directly depending only upon the immediate flanking sites). The results reveal that the general time-reversible model with nearest-neighbor dependencies substantially improves the fit to the data as compared to the corresponding model with site independence. Using the parameter estimates from our model, we elaborate on the importance of the 5-methylcytosine deamination process (i.e., the CpG effect) and show that this process also depends upon the 5' neighboring base identity. We hint at the possibility of a so-called TpA effect and show that the observed substitution behavior is very complex in the light of dinucleotide estimates. We also discuss the presence of CpG effects in a nuclear small subunit dataset and find significant evidence that evolutionary models incorporating context-dependent effects perform substantially better than independent-site models and in some cases even outperform models that incorporate varying rates across sites.
引用
收藏
页码:675 / 692
页数:18
相关论文
共 78 条
[1]   Identification and measurement of neighbor-dependent nucleotide substitution processes [J].
Arndt, PF ;
Hwa, T .
BIOINFORMATICS, 2005, 21 (10) :2322-2328
[2]   DNA sequence evolution with neighbor-dependent mutation [J].
Arndt, PF ;
Burge, CB ;
Hwa, T .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :313-322
[3]   FREQUENCY AND SPECTRUM OF MUTATIONS PRODUCED BY A SINGLE CIS-SYN THYMINE-THYMINE CYCLOBUTANE DIMER IN A SINGLE-STRANDED VECTOR [J].
BANERJEE, SK ;
CHRISTENSEN, RB ;
LAWRENCE, CW ;
LECLERC, JE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (21) :8141-8145
[4]   Solvable models of neighbor-dependent substitution processes [J].
Berard, Jean ;
Gouere, Jean-Baptiste ;
Piau, Didier .
MATHEMATICAL BIOSCIENCES, 2008, 211 (01) :56-88
[6]   THE INFLUENCE OF NEAREST NEIGHBORS ON THE RATE AND PATTERN OF SPONTANEOUS POINT MUTATIONS [J].
BLAKE, RD ;
HESS, ST ;
NICHOLSONTUELL, J .
JOURNAL OF MOLECULAR EVOLUTION, 1992, 34 (03) :189-200
[7]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[8]  
BULMER M, 1986, MOL BIOL EVOL, V3, P322
[9]   Pseudo-likelihood analysis of codon substitution models with neighbor-dependent rates [J].
Christensen, OF ;
Hobolth, A ;
Jensen, JL .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (09) :1166-1182
[10]  
Cowell R. G., 1999, PROBABILISTIC NETWOR