Multiple sequence alignment: In pursuit of homologous DNA positions

被引:91
作者
Kumar, Sudhir [1 ]
Filipski, Alan
机构
[1] Arizona State Univ, Ctr Evolutionary Funct Genom, Biodesign Inst, Tempe, AZ 85287 USA
[2] Arizona State Univ, Sch Life Sci, Tempe, AZ 85287 USA
关键词
D O I
10.1101/gr.5232407
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.
引用
收藏
页码:127 / 135
页数:9
相关论文
共 110 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[3]  
[Anonymous], 2005, INTRO BIOINFORMATICS
[4]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[5]   Orthologous repeats and mammalian phylogenetic inference [J].
Bashir, A ;
Ye, C ;
Price, AL ;
Bafna, V .
GENOME RESEARCH, 2005, 15 (07) :998-1006
[6]   Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences [J].
Bergman, CM ;
Kreitman, M .
GENOME RESEARCH, 2001, 11 (08) :1335-1345
[7]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[8]   FootPrinter: a program designed for phylogenetic footprinting [J].
Blanchette, M ;
Tompa, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3840-3842
[9]   Phylogenetic shadowing of primate sequences to find functional regions of the human genome [J].
Boffelli, D ;
McAuliffe, J ;
Ovcharenko, D ;
Lewis, KD ;
Ovcharenko, I ;
Pachter, L ;
Rubin, EM .
SCIENCE, 2003, 299 (5611) :1391-1394
[10]   MAVID: Constrained ancestral alignment of multiple sequences [J].
Bray, N ;
Pachter, L .
GENOME RESEARCH, 2004, 14 (04) :693-699