FORRepeats: detects repeats on entire chromosomes and between genomes

被引:40
作者
Lefebvre, A [1 ]
Lecroq, T [1 ]
Dauchel, H [1 ]
Alexandre, J [1 ]
机构
[1] Univ Rouen, ABISS, F-76821 Mont St Aignan, France
关键词
D O I
10.1093/bioinformatics/btf843
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As more and more whole genomes are available, there is a need for new methods to compare large sequences and transfer biological knowledge from annotated genomes to related new ones. BLAST is not suitable to compare multimegabase DNA sequences. MegaBLAST is designed to compare closely related large sequences. Some tools to detect repeats in large sequences have already been developed such as MUMmer or REPuter. They also have time or space restrictions. Moreover, in terms of applications, REPuter only computes repeats and MUMmer works better with related genomes. Results: We present a heuristic method, named FORRepeats, which is based on a novel data structure called factor oracle. In the first step it detects exact repeats in large sequences. Then, in the second step, it computes approximate repeats and performs pairwise comparison. We compared its computational characteristics with BLAST and REPuter. Results demonstrate that it is fast and space economical. We show FORRepeats ability to perform intra-genomic comparison and to detect repeated DNA sequences in the complete genome of the model plant Arabidopsis thaliana.
引用
收藏
页码:319 / 326
页数:8
相关论文
共 22 条
[1]  
Allauzen C, 1999, LECT NOTES COMPUT SC, V1725, P295
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Comparative sequence analysis of plant nuclear genomes: Microcolinearity and its many exceptions [J].
Bennetzen, JL .
PLANT CELL, 2000, 12 (07) :1021-1029
[4]   Microbial genomes: dealing with diversity [J].
Boucher, Y ;
Nesbo, CL ;
Doolittle, WF .
CURRENT OPINION IN MICROBIOLOGY, 2001, 4 (03) :285-289
[5]   Two-dimensional RFLP analyses reveal megabase-sized clusters of rRNA gene variants in Arabidopsis thaliana, suggesting local spreading of variants as the mode for gene homogenization during concerted evolution [J].
Copenhaver, GP ;
Pikaard, CS .
PLANT JOURNAL, 1996, 9 (02) :273-282
[6]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[7]   Polymorphisms and genomic organization of repetitive DNA from centromeric regions of Arabidopsis chromosomes [J].
Heslop-Harrison, JS ;
Murata, M ;
Ogura, Y ;
Schwarzacher, T ;
Motoyoshi, F .
PLANT CELL, 1999, 11 (01) :31-42
[8]   Comparative DNA analysis across diverse genomes [J].
Karlin, S ;
Campbell, AM ;
Mrázek, J .
ANNUAL REVIEW OF GENETICS, 1998, 32 :185-225
[9]   Analysis of the genome sequence of the flowering plant Arabidopsis thaliana [J].
Kaul, S ;
Koo, HL ;
Jenkins, J ;
Rizzo, M ;
Rooney, T ;
Tallon, LJ ;
Feldblyum, T ;
Nierman, W ;
Benito, MI ;
Lin, XY ;
Town, CD ;
Venter, JC ;
Fraser, CM ;
Tabata, S ;
Nakamura, Y ;
Kaneko, T ;
Sato, S ;
Asamizu, E ;
Kato, T ;
Kotani, H ;
Sasamoto, S ;
Ecker, JR ;
Theologis, A ;
Federspiel, NA ;
Palm, CJ ;
Osborne, BI ;
Shinn, P ;
Conway, AB ;
Vysotskaia, VS ;
Dewar, K ;
Conn, L ;
Lenz, CA ;
Kim, CJ ;
Hansen, NF ;
Liu, SX ;
Buehler, E ;
Altafi, H ;
Sakano, H ;
Dunn, P ;
Lam, B ;
Pham, PK ;
Chao, Q ;
Nguyen, M ;
Yu, GX ;
Chen, HM ;
Southwick, A ;
Lee, JM ;
Miranda, M ;
Toriumi, MJ ;
Davis, RW .
NATURE, 2000, 408 (6814) :796-815
[10]  
Kurtz S, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P228