Splign: algorithms for computing spliced alignments with identification of paralogs

被引:277
作者
Kapustin, Yuri [1 ]
Souvorov, Alexander [1 ]
Tatusova, Tatiana [1 ]
Lipman, David [1 ]
机构
[1] Natl Inst Hlth, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20814 USA
基金
英国科研创新办公室;
关键词
D O I
10.1186/1745-6150-3-20
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. Results: We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. Conclusion: Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. Reviewers: This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand).
引用
收藏
页数:13
相关论文
共 15 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], GENOME BIOL
[3]   A computer program for aligning a cDNA sequence with a genomic DNA sequence [J].
Florea, L ;
Hartzell, G ;
Zhang, Z ;
Rubin, GM ;
Miller, W .
GENOME RESEARCH, 1998, 8 (09) :967-974
[4]   Gene recognition via spliced sequence alignment [J].
Gelfand, MS ;
Mironov, AA ;
Pevzner, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) :9061-9066
[5]  
KAPUSTIN Y, SPLIGN WEB SITE
[6]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202. Article published online before March 2002, 10.1101/gr.229202]
[7]   WindowMasker:: window-based masker for sequenced genomes [J].
Morgulis, A ;
Gertz, EM ;
Schäffer, AA ;
Agarwala, R .
BIOINFORMATICS, 2006, 22 (02) :134-141
[8]  
Mott R, 1997, COMPUT APPL BIOSCI, V13, P477
[9]   SSAHA: A fast search method for large DNA databases [J].
Ning, ZM ;
Cox, AJ ;
Mullikin, JC .
GENOME RESEARCH, 2001, 11 (10) :1725-1729
[10]  
PRUITT KD, 2007, NUCL ACIDS RES, V35