STAR: An algorithm to search for tandem approximate repeats

被引:70
作者
Delgrange, O
Rivals, E
机构
[1] Univ Mons Hainaut, Serv Informat Gen, B-7000 Mons, Belgium
[2] CNRS, LIRMM, UMR 5506, F-34392 Montpellier 5, France
关键词
D O I
10.1093/bioinformatics/bth335
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Tandem repeats consist in approximate and adjacent repetitions of a DNA motif. Such repeats account for large portions of eukaryotic genomes and have also been found in other life kingdoms. Owing to their polymorphism, tandem repeats have proven useful in genome cartography, forensic and population studies, etc. Nevertheless, they are not systematically detected nor annotated in genome projects. Partially because of this lack of data, their evolution is still poorly understood. Results: In this work, we design an exact algorithm to locate approximate tandem repeats (ATR) of a motif in a DNA sequence. Given a motif and a DNA sequence, our method named STAR, identifies all segments of the sequence that correspond to significant approximate tandem repetitions of the motif. In our model, an Exact Tandem Repeat (ETR) comes from the tandem duplication of the motif and an ATR derives from an ETR by a series of point mutations. An ATR can then be encoded as a number of duplications of the motif together with a list of mutations. Consequently, any sequence that is not an ATR cannot be encoded efficiently by this description, while a true ATR can. Our method uses the minimum description length criterion to identify which sequence segments are ATR. Our optimization procedure guarantees that STAR finds a combination of ATR that minimizes this criterion.
引用
收藏
页码:2812 / 2820
页数:9
相关论文
共 26 条
  • [1] ROBUST TRANSMISSION OF UNBOUNDED STRINGS USING FIBONACCI REPRESENTATIONS
    APOSTOLICO, A
    FRAENKEL, AS
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1987, 33 (02) : 238 - 245
  • [2] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [3] Big, bad minisatellites
    Buard, J
    Jeffreys, AJ
    [J]. NATURE GENETICS, 1997, 15 (04) : 327 - 328
  • [4] Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
    Bult, CJ
    White, O
    Olsen, GJ
    Zhou, LX
    Fleischmann, RD
    Sutton, GG
    Blake, JA
    FitzGerald, LM
    Clayton, RA
    Gocayne, JD
    Kerlavage, AR
    Dougherty, BA
    Tomb, JF
    Adams, MD
    Reich, CI
    Overbeek, R
    Kirkness, EF
    Weinstock, KG
    Merrick, JM
    Glodek, A
    Scott, JL
    Geoghagen, NSM
    Weidman, JF
    Fuhrmann, JL
    Nguyen, D
    Utterback, TR
    Kelley, JM
    Peterson, JD
    Sadow, PW
    Hanna, MC
    Cotton, MD
    Roberts, KM
    Hurst, MA
    Kaine, BP
    Borodovsky, M
    Klenk, HP
    Fraser, CM
    Smith, HO
    Woese, CR
    Venter, JC
    [J]. SCIENCE, 1996, 273 (5278) : 1058 - 1073
  • [5] INFORMATION ENHANCEMENT METHODS FOR LARGE-SCALE SEQUENCE-ANALYSIS
    CLAVERIE, JM
    STATES, DJ
    [J]. COMPUTERS & CHEMISTRY, 1993, 17 (02): : 191 - 201
  • [6] Detecting periodic patterns in biological sequences
    Coward, E
    Drablos, F
    [J]. BIOINFORMATICS, 1998, 14 (06) : 498 - 507
  • [7] Characteristic enrichment of DNA repeats in different genomes
    Cox, R
    Mirkin, SM
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (10) : 5237 - 5242
  • [8] Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S-cerevisiae, result from strong mutation pressures and a variety of selective forces
    Field, D
    Wills, C
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (04) : 1647 - 1652
  • [9] IDENTIFYING PERIODIC OCCURRENCES OF A TEMPLATE WITH APPLICATIONS TO PROTEIN-STRUCTURE
    FISCHETTI, VA
    LANDAU, GM
    SELLERS, PH
    SCHMIDT, JP
    [J]. INFORMATION PROCESSING LETTERS, 1993, 45 (01) : 11 - 18
  • [10] Life with 6000 genes
    Goffeau, A
    Barrell, BG
    Bussey, H
    Davis, RW
    Dujon, B
    Feldmann, H
    Galibert, F
    Hoheisel, JD
    Jacq, C
    Johnston, M
    Louis, EJ
    Mewes, HW
    Murakami, Y
    Philippsen, P
    Tettelin, H
    Oliver, SG
    [J]. SCIENCE, 1996, 274 (5287) : 546 - &