TEclass-a tool for automated classification of unknown eukaryotic transposable elements

被引:240
作者
Abrusan, Gyorgy [1 ,2 ]
Grundmann, Norbert [2 ]
DeMester, Luc [1 ]
Makalowski, Wojciech [2 ]
机构
[1] Katholieke Univ Leuven, Dept Biol, Lab Aquat Ecol & Evolutionary Biol, B-3000 Louvain, Belgium
[2] Univ Munster, Fac Med, Inst Bioinformat, D-48149 Munster, Germany
关键词
D O I
10.1093/bioinformatics/btp084
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The large number of sequenced genomes required the development of software that reconstructs the consensus sequences of transposons and other repetitive elements. However, the available tools usually focus on the accurate identification of raw repeats and provide no information about the taxonomic position of the reconstructed consensi. TEclass is a tool to classify unknown transposable elements into their four main functional categories, which reflect their mode of transposition: DNA transposons, long terminal repeats (LTRs), long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). TEclass uses machine learning support vector machine (SVM) for classification based on oligomer frequencies. It achieves 90-97% accuracy in the classification of novel DNA and LTR repeats, and 75% for LINEs and SINEs.
引用
收藏
页码:1329 / 1330
页数:2
相关论文
共 6 条
[1]   Detection of transposable elements by their compositional bias -: art. no. 94 [J].
Andrieu, O ;
Fiston, AS ;
Anxolabéhère, D ;
Quesneville, H .
BMC BIOINFORMATICS, 2004, 5 (1)
[2]   Automated de novo identification of repeat sequence families in sequenced genomes [J].
Bao, ZR ;
Eddy, SR .
GENOME RESEARCH, 2002, 12 (08) :1269-1276
[3]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[4]   Repbase update, a database of eukaryotic repetitive elements [J].
Jurka, J ;
Kapitonov, VV ;
Pavlicek, A ;
Klonowski, P ;
Kohany, O ;
Walichiewicz, J .
CYTOGENETIC AND GENOME RESEARCH, 2005, 110 (1-4) :462-467
[5]   Accurate phylogenetic classification of variable-length DNA fragments [J].
McHardy, Alice Carolyn ;
Garcia Martin, Hector ;
Tsirigos, Aristotelis ;
Hugenholtz, Philip ;
Rigoutsos, Isidore .
NATURE METHODS, 2007, 4 (01) :63-72
[6]   De novo identification of repeat families in large genomes [J].
Price, AL ;
Jones, NC ;
Pevzner, PA .
BIOINFORMATICS, 2005, 21 :I351-I358