Logical analysis of binary data with missing bits

被引:16
作者
Boros, E
Ibaraki, T [1 ]
Makino, K
机构
[1] Kyoto Univ, Grad Sch Informat, Dept Appl Math & Phys, Kyoto 6068501, Japan
[2] Rutgers State Univ, RUTCOR, Piscataway, NJ 08854 USA
[3] Osaka Univ, Grad Sch Engn, Dept Syst & Human Sci, Toyonaka, Osaka 5608531, Japan
关键词
knowledge discovery; data mining; logical analysis of data; Boolean functions; partially defined Boolean functions; missing bits; NP-hardness;
D O I
10.1016/S0004-3702(98)00110-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We model a given pair of sets of positive and negative examples, each of which may contain missing components, as a partially defined Boolean function with missing bits (pBmb) ((T) over tilde, (F) over tilde), where (T) over tilde subset of or equal to {0, 1, *}(n) and (F) over tilde subset of or equal to {0, 1, *}(n), and "*" stands for a missing bit. Then we consider the problem of establishing a Boolean function (an extension) f : {0, 1}(n) --> {0, 1} belonging to a given function class C, such that f is true (respectively, false) for every vector in (T) over tilde (respectively, in (F) over tilde). This is a fundamental problem, encountered in many areas such as learning theory, pattern recognition, example-based knowledge bases, logical analysis of data, knowledge discovery and data mining. In this paper, depending upon how to deal with missing bits, we formulate three types of extensions called robust, consistent and most robust extensions, for various classes of Boolean functions such as general, positive, Hem, threshold, decomposable and k-DNF. The complexity of the associated problems are then clarified; some of them are solvable in polynomial time while the others are NP-hard. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:219 / 263
页数:45
相关论文
共 40 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]   THE COMPLEXITY AND APPROXIMABILITY OF FINDING MAXIMUM FEASIBLE SUBSYSTEMS OF LINEAR RELATIONS [J].
AMALDI, E ;
KANN, V .
THEORETICAL COMPUTER SCIENCE, 1995, 147 (1-2) :181-210
[3]  
Angluin D., 1988, Machine Learning, V2, P319, DOI 10.1007/BF00116828
[4]  
[Anonymous], LNCS
[5]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[6]  
ASHENHURST RL, 1959, P INT S THEOR SWIT 1, P75
[7]   LINEAR-TIME ALGORITHM FOR TESTING THE TRUTH OF CERTAIN QUANTIFIED BOOLEAN FORMULAS [J].
ASPVALL, B ;
PLASS, MF ;
TARJAN, RE .
INFORMATION PROCESSING LETTERS, 1979, 8 (03) :121-123
[8]  
Ben-David S., 1993, Proceeding of the Sixth Annual ACM Conference on Computational Learning Theory, P287, DOI 10.1145/168304.168353
[9]   OCCAM RAZOR [J].
BLUMER, A ;
EHRENFEUCHT, A ;
HAUSSLER, D ;
WARMUTH, MK .
INFORMATION PROCESSING LETTERS, 1987, 24 (06) :377-380
[10]  
Bores E, 1998, INFORM COMPUT, V140, P254, DOI 10.1006/inco.1997.2687