Parametric alignment of Drosophila genomes

被引:31
作者
Dewey, Colin N.
Huggins, Peter M.
Woods, Kevin
Sturmfels, Bernd
Pachter, Lior [1 ]
机构
[1] Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
关键词
D O I
10.1371/journal.pcbi.0020073
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The classic algorithms of Needleman-Wunsch and Smith-Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman-Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.
引用
收藏
页码:606 / 614
页数:9
相关论文
共 23 条
[1]  
[Anonymous], 2003, GRADUATE TEXTS MATH
[2]   Drosophila DNase I footprint database:: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster [J].
Bergman, CM ;
Carlson, JW ;
Celniker, SE .
BIOINFORMATICS, 2005, 21 (08) :1747-1749
[3]   MAVID multiple alignment server [J].
Bray, N ;
Pachter, L .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3525-3526
[4]  
Chiaromonte F, 2002, Pac Symp Biocomput, P115
[5]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[6]  
DEWEY CN, 2006, MERCATOR MULTIPLE WH
[7]   FlyBase: genes and gene models [J].
Drysdale, RA ;
Crosby, MA .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D390-D395
[8]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[9]   Parametric multiple sequence alignment and phylogeny construction [J].
Fernández-Baca, David ;
Seppäläinen, Timo ;
Slutzki, Giora .
Journal of Discrete Algorithms, 2004, 2 (2 SPEC. ISS.) :271-287
[10]  
FERNANDEZBACA D, 2005, COMPUTER INFORM SCI, V2, P271