Exhaustive whole-genome tandem repeats search

被引:27
作者
Krishnan, A [1 ]
Tang, F [1 ]
机构
[1] Bioinformat Inst, Singapore 138671, Singapore
关键词
D O I
10.1093/bioinformatics/bth311
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Approximate tandem repeats (ATR) occur frequently in the genomes of organisms, and are a source of polymorphisms observed in individuals, and thus are of interest to those studying genetic disorders. Though extensive work has been done in order to identify ATRs, there are inherent limitations with the current approaches in terms of the number of pattern sizes that can be searched or the size of the input length. Results: This paper describes (1) a new algorithm which exhaustively finds all variable-length ATRs in a genomic sequence and (2) a precise description of, and an algorithm to significantly reduce, redundancy in the output. Our ATR definition is parameterized by a mismatch ratio p which allows for more mismatches in longer tandem repeats (and fewer in shorter). Furthermore, our algorithm is embarrassingly parallel and thus can attain near-linear speed-up on Beowulf clusters. We present results of our algorithm applied to sequences of widely differing lengths (from genes to chromosomes).
引用
收藏
页码:2702 / 2710
页数:9
相关论文
共 14 条
[1]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[2]   Association of autism severity with a monoamine oxidase A functional polymorphism [J].
Cohen, IL ;
Liu, X ;
Schutz, C ;
White, BN ;
Jenkins, EC ;
Brown, WT ;
Holden, JJA .
CLINICAL GENETICS, 2003, 64 (03) :190-197
[3]   An exhaustive DNA micro-satellite map of the human genome using high performance computing [J].
Collins, JR ;
Stephens, RM ;
Gold, B ;
Long, B ;
Dean, M ;
Burt, SK .
GENOMICS, 2003, 82 (01) :10-19
[4]   Identification of a mutant allele of the androgen receptor gene in a family with androgen insensitivity syndrome: Detection of carriers and prenatal diagnosis [J].
Giuseppina Fogu ;
Veronica Bertini ;
Salvatore Dessole ;
Pasquale Bandiera ;
Paola Maria Campus ;
Giampiero Capobianco ;
Raimonda Sanna ;
Giovanna Soro ;
Andrea Montella .
Archives of Gynecology and Obstetrics, 2003, 269 (1) :25-29
[5]   Speeding up the detection of evolutive tandem repeats [J].
Groult, R ;
Léonard, M ;
Mouchard, L .
THEORETICAL COMPUTER SCIENCE, 2004, 310 (1-3) :309-328
[6]  
GROULT R, 2002, P 27 S MATH FDN COMP, P292
[7]   Myelin basic protein gene is associated with MS in DR4- and DR5-positive Italians and Russians [J].
Guerini, FR ;
Ferrante, P ;
Losciale, L ;
Caputo, D ;
Lombardi, ML ;
Pirozzi, G ;
Luongo, V ;
Sudomoina, MA ;
Andreewski, TV ;
Alekseenkov, AD ;
Boiko, AN ;
Gusev, EI ;
Favorova, OO .
NEUROLOGY, 2003, 61 (04) :520-526
[8]   Finding approximate repetitions under Hamming distance [J].
Kolpakov, R ;
Kucherov, G .
THEORETICAL COMPUTER SCIENCE, 2003, 303 (01) :135-156
[9]   REPuter: the manifold applications of repeat analysis on a genomic scale [J].
Kurtz, S ;
Choudhuri, JV ;
Ohlebusch, E ;
Schleiermacher, C ;
Stoye, J ;
Giegerich, R .
NUCLEIC ACIDS RESEARCH, 2001, 29 (22) :4633-4642
[10]   An algorithm for approximate tandem repeats [J].
Landau, GM ;
Schmidt, JP ;
Sokol, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (01) :1-18