T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm

被引:128
作者
Jorda, Julien [1 ]
Kajava, Andrey V. [1 ]
机构
[1] Univ Montpellier 1 & 2, CNRS, UMR 5237, Ctr Rech Biochim Macromol, Montpellier, France
关键词
PROTEIN SEQUENCES; ALIGNMENT; SERVER; DNA;
D O I
10.1093/bioinformatics/btp482
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Over the last years a number of evidences have been accumulated about high incidence of tandem repeats in proteins carrying fundamental biological functions and being related to a number of human diseases. At the same time, frequently, protein repeats are strongly degenerated during evolution and, therefore, cannot be easily identified. To solve this problem, several computer programs which were based on different algorithms have been developed. Nevertheless, our tests showed that there is still room for improvement of methods for accurate and rapid detection of tandem repeats in proteins. Results: We developed a new program called T-REKS for ab initio identification of the tandem repeats. It is based on clustering of lengths between identical short strings by using a K-means algorithm. Benchmark of the existing programs and T-REKS on several sequence datasets is presented. Our program being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.
引用
收藏
页码:2632 / 2638
页数:7
相关论文
共 27 条
[11]  
2-Z
[12]   HIERARCHICAL CLUSTERING SCHEMES [J].
JOHNSON, SC .
PSYCHOMETRIKA, 1967, 32 (03) :241-254
[13]   β-structures in fibrous proteins [J].
Kajava, Andrey V. ;
Squire, John M. ;
Parry, David A. D. .
FIBROUS PROTEINS: AMYLOIDS, PRIONS AND BETA PROTEINS, 2006, 73 :1-+
[14]   The turn of the screw:: Variations of the abundant β-solenoid motif in passenger domains of Type V secretory proteins [J].
Kajava, Andrey V. ;
Steven, Alasdair C. .
JOURNAL OF STRUCTURAL BIOLOGY, 2006, 155 (02) :306-315
[15]   MODELING OF THE 3-DIMENSIONAL STRUCTURE OF PROTEINS WITH THE TYPICAL LEUCINE-RICH REPEATS [J].
KAJAVA, AV ;
VASSART, G ;
WODAK, SJ .
STRUCTURE, 1995, 3 (09) :867-877
[16]   Amino acid repeat patterns in protein sequences: Their diversity and structural-functional implications [J].
Katti, MV ;
Sami-Subbu, R ;
Ranjekar, PK ;
Gupta, VS .
PROTEIN SCIENCE, 2000, 9 (06) :1203-1209
[17]   mreps: efficient and flexible detection of tandem repeats in DNA [J].
Kolpakov, R ;
Bana, G ;
Kucherov, G .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3672-3678
[18]   An algorithm for approximate tandem repeats [J].
Landau, GM ;
Schmidt, JP ;
Sokol, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (01) :1-18
[19]   A repetitive sequence in subunits of the 26S proteasome and 20S cyclosome (anaphase-promoting complex) [J].
Lupas, A ;
Baumeister, W ;
Hofmann, K .
TRENDS IN BIOCHEMICAL SCIENCES, 1997, 22 (06) :195-196
[20]   Study on High-Temperature Flow Behavior and Substructure and Texture Evolution of TA15 Titanium Alloy [J].
Li, Ping ;
Ding, Yong-gen ;
Yao, Peng-peng ;
Xue, Ke-min ;
Li, Cheng-ming .
JOURNAL OF MATERIALS ENGINEERING AND PERFORMANCE, 2016, 25 (08) :3439-3447