Repseek, a tool to retrieve approximate repeats from large DNA sequences

被引:56
作者
Achaz, Guillaume
Boyer, Frederic
Rocha, Eduardo P. C.
Viari, Alain
Coissac, Eric
机构
[1] Univ Paris 06, Atelier Bioinformat, F-75005 Paris, France
[2] Univ Paris 06, UMR 7138, F-75252 Paris 05, France
[3] INRIA Rhone Alpes Projet HELIX, F-38334 Saint Ismier, France
[4] Inst Pasteur, Unite Genet Genom Bacteriens, F-75724 Paris 15, France
[5] Univ Grenoble 1, LAPM, UMR 5163, F-38041 Grenoble 9, France
关键词
D O I
10.1093/bioinformatics/btl519
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels.
引用
收藏
页码:119 / 121
页数:3
相关论文
共 15 条
[1]  
Abouelhoda MI, 2002, LECT NOTES COMPUT SC, V2452, P449
[2]  
Achaz G, 2003, GENETICS, V164, P1279
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   Automated de novo identification of repeat sequence families in sequenced genomes [J].
Bao, ZR ;
Eddy, SR .
GENOME RESEARCH, 2002, 12 (08) :1269-1276
[5]   APPLICATIONS AND STATISTICS FOR MULTIPLE HIGH-SCORING SEGMENTS IN MOLECULAR SEQUENCES [J].
KARLIN, S ;
ALTSCHUL, SF .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1993, 90 (12) :5873-5877
[6]  
KARLIN S, 1985, P BERKELEY C HONOR J, V1, P225
[7]  
KARP R, 1972, S THEORY COMPUTING, V4, P125
[8]   REPuter: fast computation of maximal repeats in complete genomes [J].
Kurtz, S ;
Schleiermacher, C .
BIOINFORMATICS, 1999, 15 (05) :426-427
[9]   De novo repeat classification and fragment assembly [J].
Pevzner, PA ;
Tang, HX ;
Tesler, G .
GENOME RESEARCH, 2004, 14 (09) :1786-1796
[10]   De novo identification of repeat families in large genomes [J].
Price, AL ;
Jones, NC ;
Pevzner, PA .
BIOINFORMATICS, 2005, 21 :I351-I358