A graph based algorithm for generating EST consensus sequences

被引:14
作者
Malde, K [1 ]
Coward, E
Jonassen, I
机构
[1] Univ Bergen, Dept Informat, N-5020 Bergen, Norway
[2] Univ Bergen, Bergen Ctr Computat Sci, Computat Biol Unit, N-5020 Bergen, Norway
关键词
D O I
10.1093/bioinformatics/bti184
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: EST sequences constitute an abundant, yet error prone resource for computational biology. Expressed sequences are important in gene discovery and identification, and they are also crucial for the discovery and classification of alternative splicing. An important challenge when processing EST sequences is the reconstruction of mRNA by assembling EST clusters into consensus sequences. Results: In contrast to the more established assembly tools, we propose an algorithm that constructs a graph over sequence fragments of fixed size, and produces consensus sequences as traversals of this graph. We provide a tool implementing this algorithm, and perform an experiment where the consensus sequences produced by our implementation, as well as by currently available tools, are compared to mRNA. The results show that our proposed algorithm in a majority of the cases produces consensus of higher quality than the established sequence assemblers and at a competitive speed.
引用
收藏
页码:1371 / 1375
页数:5
相关论文
共 19 条
[1]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[2]   Common intervals and sorting by reversals: a marriage of necessity [J].
Bergeron, A ;
Heber, S ;
Stoye, J .
BIOINFORMATICS, 2002, 18 :S54-S63
[3]   ESTABLISHING A HUMAN TRANSCRIPT MAP [J].
BOGUSKI, MS ;
SCHULER, GD .
NATURE GENETICS, 1995, 10 (04) :369-371
[4]   Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs [J].
Chevreux, B ;
Pfisterer, T ;
Drescher, B ;
Driesel, AJ ;
Müller, WEG ;
Wetter, T ;
Suhai, S .
GENOME RESEARCH, 2004, 14 (06) :1147-1159
[5]  
Green P., 1996, PHRAP DOCUMENTATION
[6]   CAP3: A DNA sequence assembly program [J].
Huang, XQ ;
Madan, A .
GENOME RESEARCH, 1999, 9 (09) :868-877
[7]  
Idury R M, 1995, J Comput Biol, V2, P291, DOI 10.1089/cmb.1995.2.291
[8]   Whole-genome sequence assembly for mammalian genomes: Arachne 2 [J].
Jaffe, DB ;
Butler, J ;
Gnerre, S ;
Mauceli, E ;
Lindblad-Toh, K ;
Mesirov, JP ;
Zody, MC ;
Lander, ES .
GENOME RESEARCH, 2003, 13 (01) :91-96
[9]   Clustering of highly homologous sequences to reduce the size of large protein databases [J].
Li, WZ ;
Jaroszewski, L ;
Godzik, A .
BIOINFORMATICS, 2001, 17 (03) :282-283
[10]   An optimized protocol for analysis of EST sequences [J].
Liang, F ;
Holt, I ;
Pertea, G ;
Karamycheva, S ;
Salzberg, SL ;
Quackenbush, J .
NUCLEIC ACIDS RESEARCH, 2000, 28 (18) :3657-3665