De novo repeat classification and fragment assembly

被引:142
作者
Pevzner, PA
Tang, HX
Tesler, G [1 ]
机构
[1] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
D O I
10.1101/gr.2395204
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Repetitive sequences make up a significant fraction of almost any genome, and an important and still open question in bioinformatics is how to represent all repeats in DNA sequences. We propose a new approach to repeat classification that represents all repeats in a genome as a mosaic of sub-repeats. Our key algorithmic idea also leads to new approaches to multiple alignment and fragment assembly. In particular, we show that our FragmentGluer assembler improves on Phrap and ARACHNE in assembly of BACs and bacterial genomes.
引用
收藏
页码:1786 / 1796
页数:11
相关论文
共 39 条
[1]  
Agarwal P, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P1
[2]  
[Anonymous], 1989, INTRO ALGORITHMS
[3]  
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[4]   Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes [J].
Aparicio, S ;
Chapman, J ;
Stupka, E ;
Putnam, N ;
Chia, J ;
Dehal, P ;
Christoffels, A ;
Rash, S ;
Hoon, S ;
Smit, A ;
Gelpke, MDS ;
Roach, J ;
Oh, T ;
Ho, IY ;
Wong, M ;
Detter, C ;
Verhoef, F ;
Predki, P ;
Tay, A ;
Lucas, S ;
Richardson, P ;
Smith, SF ;
Clark, MS ;
Edwards, YJK ;
Doggett, N ;
Zharkikh, A ;
Tavtigian, SV ;
Pruss, D ;
Barnstead, M ;
Evans, C ;
Baden, H ;
Powell, J ;
Glusman, G ;
Rowen, L ;
Hood, L ;
Tan, YH ;
Elgar, G ;
Hawkins, T ;
Venkatesh, B ;
Rokhsar, D ;
Brenner, S .
SCIENCE, 2002, 297 (5585) :1301-1310
[5]   A new approach to sequence comparison:: normalired sequence alignment [J].
Arslan, AN ;
Egecioglu, Ö ;
Pevzner, PA .
BIOINFORMATICS, 2001, 17 (04) :327-337
[6]   Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22 [J].
Bailey, JA ;
Yavor, AM ;
Viggiano, L ;
Misceo, D ;
Horvath, JE ;
Archidiacono, N ;
Schwartz, S ;
Rocchi, M ;
Eichler, EE .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (01) :83-100
[7]   Automated de novo identification of repeat sequence families in sequenced genomes [J].
Bao, ZR ;
Eddy, SR .
GENOME RESEARCH, 2002, 12 (08) :1269-1276
[8]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[9]   MaskerAid:: a performance enhancement to RepeatMasker [J].
Bedell, JA ;
Korf, I ;
Gish, W .
BIOINFORMATICS, 2000, 16 (11) :1040-1041
[10]  
Böcker S, 2003, LECT N BIOINFORMAT, V2812, P476