Detecting localized repeats in genomic sequences:: A new strategy and its application to Bacillus subtilis and Arabidopsis thaliana sequences

被引:7
作者
Klaerr-Blanchard, M
Chiapello, H
Coward, E
机构
[1] Inst Pasteur, Unite Regulat Express Genet, F-75724 Paris 15, France
[2] INRA, Biol Cellulaire Lab, F-78026 Versailles, France
[3] Univ Versailles Quentin Yvelines, Lab Genome & Informat, F-78035 Versailles, France
来源
COMPUTERS & CHEMISTRY | 2000年 / 24卷 / 01期
关键词
genome analysis; inexact repeats; Arabidopsis thaliana; Bacillus subtilis;
D O I
10.1016/S0097-8485(99)00047-9
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A new method for the search of local repeats in long DNA sequences, such as complete genomes, is presented. It detects a large variety of repeats varying in length from one to several hundred bases, which may contain many mutations. By mutations we mean substitutions, insertions or deletions of one or more bases. The method is based on counting occurrences of short words (3-12 bases) in sequence fragments called windows. A score is computed for each window, based on calculating exact word occurrence probabilities for all the words of a given length in the window. The probabilites are defined using a Bernouilli model (independent letters) for the sequence, using the actual letter frequencies from each window. A plot of the probabilities along the sequence for high-scoring windows facilitates the identification of the repeated patterns. We applied the method to the 1.87 Mb sequence of chromosome 4 of Arabidopsis thaliana and to the complete genome of Bacillus subtilis (4.2 Mb). The repeats that we found were classified according to their size, number of occurrences, distance between occurrences, and location with respect to genes. The method proves particularly useful in detecting long, inexact repeats that are local, but not necessarily tandem. The method is implemented as a C program called EXCEP, which is available on request from the authors. (C) 2000 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:57 / 70
页数:14
相关论文
共 15 条
[1]   A METHOD FOR FAST DATABASE SEARCH FOR ALL K-NUCLEOTIDE REPEATS [J].
BENSON, G ;
WATERMAN, MS .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4828-4836
[2]  
Benson G., 1998, RECOMB 98. Proceedings of the Second Annual International Conference on Computational Molecular Biology, P20, DOI 10.1145/279069.279079
[3]   Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana [J].
Bevan, M ;
Bancroft, I ;
Bent, E ;
Love, K ;
Goodman, H ;
Dean, C ;
Bergkamp, R ;
Dirkse, W ;
Van Staveren, M ;
Stiekema, W ;
Drost, L ;
Ridley, P ;
Hudson, SA ;
Patel, K ;
Murphy, G ;
Piffanelli, P ;
Wedler, H ;
Wedler, E ;
Wambutt, R ;
Weitzenegger, T ;
Pohl, TM ;
Terryn, N ;
Gielen, J ;
Villarroel, R ;
De Clerck, R ;
Van Montagu, M ;
Lecharny, A ;
Auborg, S ;
Gy, I ;
Kreis, M ;
Lao, N ;
Kavanagh, T ;
Hempel, S ;
Kotter, P ;
Entian, KD ;
Rieger, M ;
Schaeffer, M ;
Funk, B ;
Mueller-Auer, S ;
Silvey, M ;
James, R ;
Montfort, A ;
Pons, A ;
Puigdomenech, P ;
Douka, A ;
Voukelatou, E ;
Milioni, D ;
Hatzopoulos, P ;
Piravandi, E ;
Obermaier, B .
NATURE, 1998, 391 (6666) :485-488
[4]  
COWARD E, 1998, THESIS NORWEGIAN U S
[5]   Detection of internal repeats: how common are they? [J].
Heringa, J .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :338-345
[6]  
KLAERR M, 1999, UNPUB STUDY REPEATS
[7]   The complete genome sequence of the Gram-positive bacterium Bacillus subtilis [J].
Kunst, F ;
Ogasawara, N ;
Moszer, I ;
Albertini, AM ;
Alloni, G ;
Azevedo, V ;
Bertero, MG ;
Bessieres, P ;
Bolotin, A ;
Borchert, S ;
Borriss, R ;
Boursier, L ;
Brans, A ;
Braun, M ;
Brignell, SC ;
Bron, S ;
Brouillet, S ;
Bruschi, CV ;
Caldwell, B ;
Capuano, V ;
Carter, NM ;
Choi, SK ;
Codani, JJ ;
Connerton, IF ;
Cummings, NJ ;
Daniel, RA ;
Denizot, F ;
Devine, KM ;
Dusterhoft, A ;
Ehrlich, SD ;
Emmerson, PT ;
Entian, KD ;
Errington, J ;
Fabret, C ;
Ferrari, E ;
Foulger, D ;
Fritz, C ;
Fujita, M ;
Fujita, Y ;
Fuma, S ;
Galizzi, A ;
Galleron, N ;
Ghim, SY ;
Glaser, P ;
Goffeau, A ;
Golightly, EJ ;
Grandi, G ;
Guiseppi, G ;
Guy, BJ ;
Haga, K .
NATURE, 1997, 390 (6657) :249-256
[8]  
Landau G. M., 1993, Combinatorial Pattern Matching. 4th Annual Symposium, CPM 93 Proceedings, P120, DOI 10.1007/BFb0029801
[9]  
MEDIGUE C, 1995, GENE-COMBIS, V165, pGC37, DOI 10.1016/0378-1119(95)00636-K
[10]   The complete genome of Bacillus subtilis:: from sequence annotation to data management and analysis [J].
Moszer, I .
FEBS LETTERS, 1998, 430 (1-2) :28-36