Scaling up inductive learning with massive parallelism

被引:4
作者
Provost, FJ [1 ]
Aronis, JM [1 ]
机构
[1] UNIV PITTSBURGH, INTELLIGENT SYST LAB, PITTSBURGH, PA 15230 USA
关键词
inductive learning; parallelism; small disjuncts;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning programs need to scale up to very large data sets for several reasons, including increasing accuracy and discovering infrequent special cases. Current inductive learners perform well with hundreds or thousands of training examples, but in some cases, up to a million or more examples may be necessary to learn important special cases with confidence. These tasks are infeasible for current learning programs running on sequential machines. We discuss the need for very large data sets and prior efforts to scale up machine learning methods. This discussion motivates a strategy that exploits the inherent parallelism present in many learning algorithms. We describe a parallel implementation of one inductive learning program on the CM-2 Connection Machine, show that it scales up to millions of examples, and show that it uncovers special-case rules that sequential learning programs, running on smaller datasets, would miss. The parallel version of the learning program is preferable to the sequential version for example sets larger than about 10K examples. When learning from a public-health database consisting of 3.5 million examples, the parallel rule-learning system uncovered a surprising relationship that has led to considerable follow-up research.
引用
收藏
页码:33 / 46
页数:14
相关论文
共 42 条
[1]  
ARONIS JM, 1994, AAAI 94 WORKSH KNOWL, P347
[2]  
BOBROW D, 1993, ARTIF INTELL, V60, P197
[3]  
BUCHANAN B, 1978, PATTERN DIRECTED INT
[4]  
BUNTINE W, 1991, THESIS U TECHNOLOGY
[5]  
CATLETT J, 1991, LECT NOTES ARTIF INT, V482, P164, DOI 10.1007/BFb0017012
[6]  
CATLETT J, 1991, THESIS U SYDNEY AUST
[7]  
CATLETT J, 1991, P 8 INT WORKSH MACH, P596
[8]  
CATLETT J, 1992, P 9 INT C MACH LEARN, P49
[9]  
Chan P. K., 1993, Proceedings of the Second International Workshop on Multistrategy Learning (MSL-93), P150
[10]  
Chan P. K., 1993, P AAAI WORKSH KNOWL, P227