A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs

被引:38
作者
Li, HF [1 ]
Jiang, T [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
关键词
translation initiation site; support vector machine; edit distance; mRNA;
D O I
10.1089/cmb.2005.12.702
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The prediction of translation initiation sites (TISs) in eukaryotic mRNAs has been a challenging problem in computational molecular biology. In this paper, we present a new algorithm to recognize TISs with a very high accuracy. Our algorithm includes two novel ideas. First, we introduce a class of new sequence-similarity kernels based on string editing, called edit kernels, for use with support vector machines (SVMs) in a discriminative approach to predict TISs. The edit kernels are simple and have significant biological and probabilistic interpretations. Although the edit kernels are not positive definite, it is easy to make the kernel matrix positive definite by adjusting the parameters. Second, we convert the region of an input mRNA sequence downstream to a putative TIS into an amino acid sequence before applying SVMs to avoid the high redundancy in the genetic code. The algorithm has been implemented and tested on previously published data. Our experimental results on real mRNA data show that both ideas improve the prediction accuracy greatly and that our method performs significantly better than those based on neural networks and SVMs with polynomial kernels or Salzberg kernels.
引用
收藏
页码:702 / 718
页数:17
相关论文
共 45 条
[1]  
AGARWAL P, 1998, P 2 ANN INT C COMP M, P1
[2]  
AIZERMAN MA, 1965, AUTOMAT REM CONTR+, V25, P821
[3]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[4]   THEORY OF REPRODUCING KERNELS [J].
ARONSZAJN, N .
TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1950, 68 (MAY) :337-404
[5]  
Berg C., 1984, HARMONIC ANAL SEMIGR
[6]   An extension of Ukkonen's enhanced dynamic programming ASM algorithm [J].
Berghel, H ;
Roach, D .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1996, 14 (01) :94-106
[7]  
Bertsekas D., 1999, NONLINEAR PROGRAMMIN
[8]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[9]  
CORTES C, 2003, P 16 ANN C LEARN THE, P41
[10]  
CORTES C, 2002, ADV NEURAL INFORM PR, V15, P41