Identification of coding and non-coding sequences using local Holder exponent formalism

被引:19
作者
Kulkarni, OC [1 ]
Vigneshwar, R [1 ]
Jayaraman, VK [1 ]
Kulkarni, BD [1 ]
机构
[1] Natl Chem Lab, Pune 411008, Maharashtra, India
关键词
D O I
10.1093/bioinformatics/bti639
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. The discovery of existence of distinct scaling relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to exploit the differences in the local singularity distributions for characterization and classification of coding and non-coding sequences. Results: The local singularity density distribution in the coding and non-coding sequences of four genomes was first estimated using the wavelet transform modulus maxima methodology. Support vector machines classifier was then trained with the extracted features. The trained classifier is able to provide an average test accuracy of 97.7%. The local singularity features in a DNA sequence can be exploited for successful identification of coding and non-coding sequences.
引用
收藏
页码:3818 / 3823
页数:6
相关论文
共 55 条
[1]   Recognition of an organism from fragments of its complete genome [J].
Anh, V.V. ;
Lau, K.S. ;
Yu, Z.G. .
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 2002, 66 (03) :1-031910
[2]  
[Anonymous], DATA MINING PRACTICA
[3]   THE THERMODYNAMICS OF FRACTALS REVISITED WITH WAVELETS [J].
ARNEODO, A ;
BACRY, E ;
MUZY, JF .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 1995, 213 (1-2) :232-275
[4]   Nucleotide composition effects on the long-range correlations in human genes [J].
Arneodo, A ;
d'Aubenton-Carafa, Y ;
Audit, B ;
Bacry, E ;
Muzy, JF ;
Thermes, C .
EUROPEAN PHYSICAL JOURNAL B, 1998, 1 (02) :259-263
[5]   Long-range correlations in genomic DNA: A signature of the nucleosomal structure [J].
Audit, B ;
Thermes, C ;
Vaillant, C ;
d'Aubenton-Carafa, Y ;
Muzy, JF ;
Ameodo, A .
PHYSICAL REVIEW LETTERS, 2001, 86 (11) :2471-2474
[6]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[7]   Improved prediction of protein-protein binding sites using a support vector machines approach [J].
Bradford, JR ;
Westhead, DR .
BIOINFORMATICS, 2005, 21 (08) :1487-1494
[8]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[9]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[10]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167