FORRepeats: detects repeats on entire chromosomes and between genomes

被引:40
作者
Lefebvre, A [1 ]
Lecroq, T [1 ]
Dauchel, H [1 ]
Alexandre, J [1 ]
机构
[1] Univ Rouen, ABISS, F-76821 Mont St Aignan, France
关键词
D O I
10.1093/bioinformatics/btf843
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As more and more whole genomes are available, there is a need for new methods to compare large sequences and transfer biological knowledge from annotated genomes to related new ones. BLAST is not suitable to compare multimegabase DNA sequences. MegaBLAST is designed to compare closely related large sequences. Some tools to detect repeats in large sequences have already been developed such as MUMmer or REPuter. They also have time or space restrictions. Moreover, in terms of applications, REPuter only computes repeats and MUMmer works better with related genomes. Results: We present a heuristic method, named FORRepeats, which is based on a novel data structure called factor oracle. In the first step it detects exact repeats in large sequences. Then, in the second step, it computes approximate repeats and performs pairwise comparison. We compared its computational characteristics with BLAST and REPuter. Results demonstrate that it is fast and space economical. We show FORRepeats ability to perform intra-genomic comparison and to detect repeated DNA sequences in the complete genome of the model plant Arabidopsis thaliana.
引用
收藏
页码:319 / 326
页数:8
相关论文
共 22 条
[11]  
Lefebvre A., 2000, P AUSTR WORKSH COMB, P145
[12]   Massive sequence comparisons as a help in annotating genomic sequences [J].
Louis, A ;
Ollivier, E ;
Aude, JC ;
Risler, EL .
GENOME RESEARCH, 2001, 11 (07) :1296-1303
[13]   A HIGHLY REPEATED DNA-SEQUENCE IN ARABIDOPSIS-THALIANA [J].
MARTINEZZAPATER, JM ;
ESTELLE, MA ;
SOMERVILLE, CR .
MOLECULAR AND GENERAL GENETICS, 1986, 204 (03) :417-423
[14]   Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana [J].
Mayer, K ;
Schüller, C ;
Wambutt, R ;
Murphy, G ;
Volckaert, G ;
Pohl, T ;
Düsterhöft, A ;
Stiekema, W ;
Entian, KD ;
Terryn, N ;
Harris, B ;
Ansorge, W ;
Brandt, P ;
Grivell, L ;
Rieger, M ;
Weichselgartner, M ;
de Simone, V ;
Obermaier, B ;
Mache, R ;
Müller, M ;
Kreis, M ;
Delseny, M ;
Puigdomenech, P ;
Watson, M ;
Schmidtheini, T ;
Reichert, B ;
Portatelle, D ;
Perez-Alonso, M ;
Boutry, M ;
Bancroft, I ;
Vos, P ;
Hoheisel, J ;
Zimmermann, W ;
Wedler, H ;
Ridley, P ;
Langham, SA ;
McCullagh, B ;
Bilham, L ;
Robben, J ;
Van der Schueren, J ;
Grymonprez, B ;
Chuang, YJ ;
Vandenbussche, F ;
Braeken, M ;
Weltjens, I ;
Voet, M ;
Bastiaens, I ;
Aert, R ;
Defoor, E ;
Weitzenegger, T .
NATURE, 1999, 402 (6763) :769-+
[15]   SPACE-ECONOMICAL SUFFIX TREE CONSTRUCTION ALGORITHM [J].
MCCREIGHT, EM .
JOURNAL OF THE ACM, 1976, 23 (02) :262-272
[16]   Comparison of genomic DNA sequences: solved and unsolved problems [J].
Miller, W .
BIOINFORMATICS, 2001, 17 (05) :391-397
[17]   Computational comparisons of model genomes [J].
Ouzounis, C ;
Casari, G ;
Sander, C ;
Tamames, J ;
Valencia, A .
TRENDS IN BIOTECHNOLOGY, 1996, 14 (08) :280-285
[18]   Athila, a new retroelement from Arabidopsis thaliana [J].
Pelissier, T ;
Tutois, S ;
Deragon, JM ;
Tourmente, S ;
Genestier, S ;
Picard, G .
PLANT MOLECULAR BIOLOGY, 1995, 29 (03) :441-452
[19]   Arabidopsis thaliana centromere regions: Genetic map positions and repetitive DNA structure [J].
Round, EK ;
Flowers, SK ;
Richards, EJ .
GENOME RESEARCH, 1997, 7 (11) :1045-1053
[20]   Genomes, genes and junk: the large-scale organization of plant chromosomes [J].
Schmidt, T ;
Heslop-Harrison, JS .
TRENDS IN PLANT SCIENCE, 1998, 3 (05) :195-199