Mathematical programming in data mining

被引:86
作者
Mangasarian, OL [1 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
基金
美国国家科学基金会;
关键词
mathematical programming; feature selection; clustering; robust representation; POLYNOMIAL-TIME ALGORITHM;
D O I
10.1023/A:1009735908398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mathematical programming approaches to three fundamental problems will be described: feature selection, clustering and robust representation. The feature selection problem considered is that of discriminating between two sets while recognizing irrelevant and redundant features and suppressing them. This creates a lean model that often generalizes better to new unseen data. Computational results on real data confirm improved generalization of leaner models. Clustering is exemplified by the unsupervised learning of patterns and clusters that may exist in a given database and is a useful tool for knowledge discovery in databases (KDD). A mathematical programming formulation of this problem is proposed that is theoretically justifiable and computationally implementable in a finite number of steps. A resulting k-Median Algorithm is utilized to discover very useful survival curves for breast cancer patients from a medical database. Robust representation is concerned with minimizing trained model degradation when applied to new problems. A novel approach is proposed that purposely tolerates a small error in the training process in order to avoid overfitting data that may contain errors. Examples of applications of these concepts are given.
引用
收藏
页码:183 / 201
页数:19
相关论文
共 73 条
[31]   IMPROVED LINEAR-PROGRAMMING MODELS FOR DISCRIMINANT-ANALYSIS [J].
GLOVER, F .
DECISION SCIENCES, 1990, 21 (04) :771-785
[32]   MATHEMATICAL PROGRAMMING METHODS OF PATTERN CLASSIFICATION [J].
GRINOLD, RC .
MANAGEMENT SCIENCE SERIES A-THEORY, 1972, 19 (03) :272-289
[33]  
Hillier F., 2015, Introduction to operations Research
[34]  
Huber P., 2011, ROBUST STAT
[35]   ROBUST ESTIMATION OF LOCATION PARAMETER [J].
HUBER, PJ .
ANNALS OF MATHEMATICAL STATISTICS, 1964, 35 (01) :73-&
[36]  
JOHN G, 1994, P 11 INT C MACH LEAR
[37]   NONPARAMETRIC-ESTIMATION FROM INCOMPLETE OBSERVATIONS [J].
KAPLAN, EL ;
MEIER, P .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1958, 53 (282) :457-481
[38]  
KARLIN S, 1992, MATH METHODS THEORY, V1
[39]   A NEW POLYNOMIAL-TIME ALGORITHM FOR LINEAR-PROGRAMMING [J].
KARMARKAR, N .
COMBINATORICA, 1984, 4 (04) :373-395
[40]  
Klarbring A., 1993, chapter Mathematical Programming in Contact Problems, P233