Taming the curse of dimensionality in kernels and novelty detection

被引:20
作者
Evangelista, Paul F. [1 ]
Embrechts, Mark J. [2 ]
Szymanski, Boleslaw K. [3 ]
机构
[1] US Mil Acad, Dept Syst Engn, West Point, NY 10996 USA
[2] Rensselaer Polytech Inst, Dept Decis Sci & Engn Syst, Troy, NY 12181 USA
[3] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY USA
来源
APPLIED SOFT COMPUTING TECHNOLOGIES: THE CHALLENGE OF COMPLEXITY | 2006年 / 34卷
关键词
D O I
10.1007/3-540-31662-0_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
The curse of dimensionality is a well known but not entirely well-understood phenomena. Too much data, in terms of the number of input variables, is not always a good thing. This is especially true when the problem involves unsupervised learning or supervised learning with unbalanced data (many negative observations but minimal positive observations). This paper addresses two issues involving high dimensional data: The first issue explores the behavior of kernels in high dimensional data. It is shown that variance, especially when contributed by meaningless noisy variables, confounds learning methods. The second part of this paper illustrates methods to overcome dimensionality problems with unsupervised learning utilizing subspace models. The modeling approach involves novelty detection with the one-class SVM.
引用
收藏
页码:425 / 438
页数:14
相关论文
共 28 条
[1]
AGGARWAL CC, 2001, P 2001 ACM SIGMOD IN
[2]
[Anonymous], 2002, J. Mach. Learn. Res
[3]
BENNETT KP, 2001, SUPPORT VECTOR MACHI, V2
[4]
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[5]
BONISSONE P, 2004, P MULT CLASS SYST MC
[6]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]
CHANG CC, 2004, LIBSVM LIB SUPPORT V
[9]
CHEN Y, 2001, P IEEE INT C IM PROC
[10]
EVANGELISTA PF, 2005, INT JONT C NEUR NETW