Pasting small votes for classification in large databases and on-line

被引:212
作者
Breiman, L [1 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94708 USA
关键词
combining; database; votes; pasting;
D O I
10.1023/A:1007563306331
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various work-arounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast. The procedure takes small pieces of the data, grows a predictor on each small piece and then pastes these predictors together. A version is given that scales up to terabyte data sets. The methods are also applicable to on-line learning.
引用
收藏
页码:85 / 103
页数:19
相关论文
共 19 条
[1]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[2]  
Breiman L, 1998, ANN STAT, V26, P801
[3]  
Breiman L, 1996, OUT OF BAG ESTIMATIO
[4]  
BREIMAN L, 1994, P ANN AM STAT ASS M
[5]  
CHAN P, 1997, UNPUB J DATA MINING
[6]  
CHAN P, 1997, J INTELL INF SYST, V9, P5
[7]  
Drucker H, 1996, ADV NEUR IN, V8, P479
[8]  
Freund Y., 1995, A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting
[9]  
Freund Y., 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P148
[10]  
Michie D., 1994, Technometrics, V37, P459, DOI DOI 10.2307/1269742