XGBoost: A Scalable Tree Boosting System

被引：27849

作者：

Chen, Tianqi ^{[1
]}

Guestrin, Carlos ^{[1
]}

机构：

[1] Univ Washington, Seattle, WA 98195 USA

来源：

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2016年

基金：

美国国家科学基金会;

关键词：

Large-scale Machine Learning;

D O I：

10.1145/2939672.2939785

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

引用

页码：785 / 794

页数：10

共 25 条

[1]

[Anonymous], 2011, Scaling up Machine Learning: Parallel and Distributed Approaches

[2]

[Anonymous], 2016, The Journal of Machine Learning Research, DOI DOI 10.1145/2882903.2912565

[3]

[Anonymous], 2008, Advances in Neural Information Processing Systems NIPS

[4]

[Anonymous], 2011, P 20 INT C WORLD WID, DOI DOI 10.1145/1963405.1963461

[5]

Bekkerman Ron., The Present and the Future of the KDD Cup Competition: an Outsider's Perspective

[6]

Bennett J., 2007, Proceedings of KDD Cup and Workshop, V2007, P35

[7] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[8]

Burges C. J., 2010, Learning, V11, DOI DOI 10.1111/J.1467-8535

[9]

Chapelle O., 2011, JMLR Workshop and Conference Proceedings, P1

[10]

Chen Tianqi., 2013, P 30 INT C MACHINE L, V28, P436

← 1 2 3 →