T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm

被引:128
作者
Jorda, Julien [1 ]
Kajava, Andrey V. [1 ]
机构
[1] Univ Montpellier 1 & 2, CNRS, UMR 5237, Ctr Rech Biochim Macromol, Montpellier, France
关键词
PROTEIN SEQUENCES; ALIGNMENT; SERVER; DNA;
D O I
10.1093/bioinformatics/btp482
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.
引用
收藏
页码:2632 / 2638
页数:7
相关论文
共 27 条
[1]   Homology-based method for identification of protein repeats using statistical significance estimates [J].
Andrade, MA ;
Ponting, CP ;
Gibson, TJ ;
Bork, P .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 298 (03) :521-537
[2]   Structure, function, and amyloidogenesis of fungal prions: Filament polymorphism and prion variants [J].
Baxa, Ulrich ;
Cassese, Todd ;
Kajava, Andrey V. ;
Steven, Alasdair C. .
FIBROUS PROTEINS: AMYLOIDS, PRIONS AND BETA PROTEINS, 2006, 73 :125-+
[3]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[4]   STAR: An algorithm to search for tandem approximate repeats [J].
Delgrange, O ;
Rivals, E .
BIOINFORMATICS, 2004, 20 (16) :2812-2820
[5]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[6]   ExPASy: the proteomics server for in-depth protein knowledge and analysis [J].
Gasteiger, E ;
Gattiker, A ;
Hoogland, C ;
Ivanyi, I ;
Appel, RD ;
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3784-3788
[7]   The REPRO server: finding protein internal sequence repeats through the Web [J].
George, RA ;
Heringa, J .
TRENDS IN BIOCHEMICAL SCIENCES, 2000, 25 (10) :515-517
[8]   PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS [J].
GRIBSKOV, M ;
MCLACHLAN, AD ;
EISENBERG, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) :4355-4358
[9]   ERROR DETECTING AND ERROR CORRECTING CODES [J].
HAMMING, RW .
BELL SYSTEM TECHNICAL JOURNAL, 1950, 29 (02) :147-160
[10]  
Heger A, 2000, PROTEINS, V41, P224, DOI 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO