GATA: a graphic alignment tool for comparative sequence analysis

被引:62
作者
Nix, DA
Eisen, MB
机构
[1] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Dept Genome Sci, Div Life Sci, Berkeley, CA 94720 USA
关键词
D O I
10.1186/1471-2105-6-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments. Results: To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis ( GATA http:// gata. sourceforge. net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format ( GFF) file. Conclusions: GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non- coding, 0 - 200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.
引用
收藏
页数:8
相关论文
共 25 条
[1]  
ATLSCHUL SF, 1990, J MOL BIOL, V215, P403
[2]   MAVID: Constrained ancestral alignment of multiple sequences [J].
Bray, N ;
Pachter, L .
GENOME RESEARCH, 2004, 14 (04) :693-699
[3]   JDotter: a Java']Java interface to multiple dotplots generated by dotter [J].
Brodie, R ;
Roper, RL ;
Upton, C .
BIOINFORMATICS, 2004, 20 (02) :279-281
[4]   New computational approaches for analysis of cis-regulatory networks [J].
Brown, CT ;
Rust, AG ;
Clarke, PJC ;
Pan, Z ;
Schilstra, MJ ;
De Buysscher, T ;
Griffin, G ;
Wold, BJ ;
Cameron, RA ;
Davidson, EH ;
Bolouri, H .
DEVELOPMENTAL BIOLOGY, 2002, 246 (01) :86-102
[5]   Automated whole-genome multiple alignment of rat, mouse, and human [J].
Brudno, M ;
Poliakov, A ;
Salamov, A ;
Cooper, GM ;
Sidow, A ;
Rubin, EM ;
Solovyev, V ;
Batzoglou, S ;
Dubchak, I .
GENOME RESEARCH, 2004, 14 (04) :685-692
[6]   Glocal alignment: finding rearrangements during alignment [J].
Brudno, Michael ;
Malde, Sanket ;
Poliakov, Alexander ;
Do, Chuong B. ;
Couronne, Olivier ;
Dubchak, Inna ;
Batzoglou, Serafim .
BIOINFORMATICS, 2003, 19 :i54-i62
[7]   Visualization of multiple genome annotations and alignments with the K-BROWSER [J].
Chakrabarti, K ;
Pachter, L .
GENOME RESEARCH, 2004, 14 (04) :716-720
[8]   Mauve: Multiple alignment of conserved genomic sequence with rearrangements [J].
Darling, ACE ;
Mau, B ;
Blattner, FR ;
Perna, NT .
GENOME RESEARCH, 2004, 14 (07) :1394-1403
[9]  
Duret L, 1996, COMPUT APPL BIOSCI, V12, P507
[10]  
Ludwig MZ, 1998, DEVELOPMENT, V125, P949