XGBoost: A Scalable Tree Boosting System

被引:27849
作者
Chen, Tianqi [1 ]
Guestrin, Carlos [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
来源
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2016年
基金
美国国家科学基金会;
关键词
Large-scale Machine Learning;
D O I
10.1145/2939672.2939785
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
引用
收藏
页码:785 / 794
页数:10
相关论文
共 25 条
[1]  
[Anonymous], 2011, Scaling up Machine Learning: Parallel and Distributed Approaches
[2]  
[Anonymous], 2016, The Journal of Machine Learning Research, DOI DOI 10.1145/2882903.2912565
[3]  
[Anonymous], 2008, Advances in Neural Information Processing Systems NIPS
[4]  
[Anonymous], 2011, P 20 INT C WORLD WID, DOI DOI 10.1145/1963405.1963461
[5]  
Bekkerman Ron., The Present and the Future of the KDD Cup Competition: an Outsider's Perspective
[6]  
Bennett J., 2007, Proceedings of KDD Cup and Workshop, V2007, P35
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Burges C. J., 2010, Learning, V11, DOI DOI 10.1111/J.1467-8535
[9]  
Chapelle O., 2011, JMLR Workshop and Conference Proceedings, P1
[10]  
Chen Tianqi., 2013, P 30 INT C MACHINE L, V28, P436