New techniques for data reduction in a database system for knowledge discovery applications

被引:32
作者
Kumar, A [1 ]
机构
[1] Univ Colorado, Coll Business, Boulder, CO 80309 USA
关键词
semantic information preserving reduction; relational databases; selection; projection; classification; reduced information systems;
D O I
10.1023/A:1008633406999
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Databases store large amounts of information about consumer transactions and other kinds of transactions. This information can be used to deduce rules about consumer behavior, and the rules can in rum be used to determine company policies, for instance with regards to production, marketing and in several other areas. Since databases typically store millions of records, and each record could have up to 100 or more attributes, as an initial step it is necessary to reduce the size of the database by eliminating attributes that do not influence the decision at all or do so very minimally. In this paper we present techniques that can be employed effectively for exact and approximate reduction in a database system. These techniques can be implemented efficiently in a database system using SQL (structured query language) commands. We tested their performance on a real data set and validated them. The results showed that the classification performance actually improved with a reduced set of attributes as compared to the case when all the attributes were present. We also discuss how our techniques differ from statistical methods and other data reduction methods such as rough sets.
引用
收藏
页码:31 / 48
页数:18
相关论文
共 25 条
[1]  
AASHEIM O, 1996, ROUGH SETS FRAMEWORK
[2]  
[Anonymous], 1991, ROUGH SETS
[3]  
[Anonymous], MARK LETT
[4]  
Berenson M.L., 1983, INTERMEDIATE STAT ME, DOI 10.2307/2288297
[5]  
Breiman, 1984, CLASSIFICATION REGRE
[6]  
FAYYAD UM, 1996, ADV KNOWLEDGE DISCOV
[7]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[8]  
KORTH HF, 1991, DATABASE SYSTEMS CON
[9]  
KRETOWSKI M, 1996, 9 INT S METH INT SYS
[10]   FUZZY ROUGH SETS - APPLICATION TO FEATURE-SELECTION [J].
KUNCHEVA, LI .
FUZZY SETS AND SYSTEMS, 1992, 51 (02) :147-153