Identification of coding and non-coding sequences using local Holder exponent formalism

被引:19
作者
Kulkarni, OC [1 ]
Vigneshwar, R [1 ]
Jayaraman, VK [1 ]
Kulkarni, BD [1 ]
机构
[1] Natl Chem Lab, Pune 411008, Maharashtra, India
关键词
D O I
10.1093/bioinformatics/bti639
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. The discovery of existence of distinct scaling relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to exploit the differences in the local singularity distributions for characterization and classification of coding and non-coding sequences. Results: The local singularity density distribution in the coding and non-coding sequences of four genomes was first estimated using the wavelet transform modulus maxima methodology. Support vector machines classifier was then trained with the extracted features. The trained classifier is able to provide an average test accuracy of 97.7%. The local singularity features in a DNA sequence can be exploited for successful identification of coding and non-coding sequences.
引用
收藏
页码:3818 / 3823
页数:6
相关论文
共 55 条
[11]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[12]   LONG-RANGE CORRELATIONS IN DNA [J].
CHATZIDIMITRIOUDREISMANN, CA ;
LARHAMMAR, D .
NATURE, 1993, 361 (6409) :212-213
[13]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[14]  
CHRIS D, 2001, BIOINFORMATICS, V17, P349
[15]  
CLAVERIE JM, 1990, METHOD ENZYMOL, V183, P237
[16]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[17]   ASSESSMENT OF PROTEIN CODING MEASURES [J].
FICKETT, JW ;
TUNG, CS .
NUCLEIC ACIDS RESEARCH, 1992, 20 (24) :6441-6450
[18]   RECOGNITION OF PROTEIN CODING REGIONS IN DNA-SEQUENCES [J].
FICKETT, JW .
NUCLEIC ACIDS RESEARCH, 1982, 10 (17) :5303-5318
[19]  
GALVAN PB, 2000, PHYS REV LETT, V85, P1342
[20]   Multifractal analysis of DNA sequences using a novel chaos-game representation [J].
Gutiérrez, JM ;
Rodríguez, MA ;
Abramson, G .
PHYSICA A, 2001, 300 (1-2) :271-284