De novo identification of repeat families in large genomes

被引:1496
作者
Price, AL [1 ]
Jones, NC [1 ]
Pevzner, PA [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
D O I
10.1093/bioinformatics/bti1018
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis. Results: Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that similar to 2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence.
引用
收藏
页码:I351 / I358
页数:8
相关论文
共 26 条
[1]  
AGARWAL P, 1994, P 2 INT C INT SYST M, P1
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Automated de novo identification of repeat sequence families in sequenced genomes [J].
Bao, ZR ;
Eddy, SR .
GENOME RESEARCH, 2002, 12 (08) :1269-1276
[4]  
BASHIR A, 2005, IN PRESS GENOME RES, pH27
[5]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[6]   Reconstructing the genomic architecture of ancestral mammals: Lessons from human, mouse, and rat genomes [J].
Bourque, G ;
Pevzner, PA ;
Tesler, G .
GENOME RESEARCH, 2004, 14 (04) :507-516
[7]  
EDGAR RC, 2005, P 13 INT C INT SYST, pH27
[8]   Genome sequence of the Brown Norway rat yields insights into mammalian evolution [J].
Gibbs, RA ;
Weinstock, GM ;
Metzker, ML ;
Muzny, DM ;
Sodergren, EJ ;
Scherer, S ;
Scott, G ;
Steffen, D ;
Worley, KC ;
Burch, PE ;
Okwuonu, G ;
Hines, S ;
Lewis, L ;
DeRamo, C ;
Delgado, O ;
Dugan-Rocha, S ;
Miner, G ;
Morgan, M ;
Hawes, A ;
Gill, R ;
Holt, RA ;
Adams, MD ;
Amanatides, PG ;
Baden-Tillson, H ;
Barnstead, M ;
Chin, S ;
Evans, CA ;
Ferriera, S ;
Fosler, C ;
Glodek, A ;
Gu, ZP ;
Jennings, D ;
Kraft, CL ;
Nguyen, T ;
Pfannkoch, CM ;
Sitter, C ;
Sutton, GG ;
Venter, JC ;
Woodage, T ;
Smith, D ;
Lee, HM ;
Gustafson, E ;
Cahill, P ;
Kana, A ;
Doucette-Stamm, L ;
Weinstock, K ;
Fechtel, K ;
Weiss, RB ;
Dunn, DM ;
Green, ED .
NATURE, 2004, 428 (6982) :493-521
[9]   Repeats in genomic DNA: mining and meaning [J].
Jurka, J .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :333-337
[10]   Repbase Update - a database and an electronic journal of repetitive elements [J].
Jurka, J .
TRENDS IN GENETICS, 2000, 16 (09) :418-420