The context-tree kernel for strings

被引:23
作者
Cuturi, M
Vert, JP
机构
[1] Ecole Mines Paris, Computat Biol Grp, F-77300 Fontainebleau, France
[2] Inst Stat Math, Minato Ku, Tokyo 1068569, Japan
关键词
string kernel; mutual information kernel; universal coding; protein homology detection;
D O I
10.1016/j.neunet.2005.07.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new kernel for strings which borrows ideas and techniques from information theory and data compression. This kernel can be used in combination with any kernel method, in particular Support Vector Machines for string classification. with notable applications in proteomics. By using a Bayesian averaging framework with conjugate priors on a class of Markovian models known as probabilistic suffix trees or context-trees, we compute the value of this kernel in linear time and space while only using the information contained in the spectrum of the considered strings. This is ensured through an adaptation of a compression method known as the context-tree weighting algorithm. Encouraging classification results are reported on a standard protein homology detection experiment, showing that the context-tree kernel performs well with respect to other state-of-the-art methods while using no biological prior knowledge. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1111 / 1123
页数:13
相关论文
共 33 条
[21]  
LESLIE C, 2003, ADV NEURAL INFORMATI, V15
[22]  
Leslie Christina, 2002, Pac Symp Biocomput, P564
[23]   The similarity metric [J].
Li, M ;
Chen, X ;
Li, X ;
Ma, B ;
Vitányi, PMB .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (12) :3250-3264
[24]  
NEMENMAN I, 2002, ADV NEURAL INFORMATI, V14, P46620
[25]  
NOBLE WS, 2002, P 6 ANN INT C RES CO, P225
[26]  
Schölkopf B, 2002, LECT NOTES ARTIF INT, V2430, P511
[27]  
Scholkopf B., 2004, Kernel Methods in Computational Biology
[28]  
Seeger M, 2002, ADV NEUR IN, V14, P905
[29]  
Smola A. J., 2002, Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
[30]  
Vapnik V., 1998, STAT LEARNING THEORY, V1, P2