PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION

被引:428
作者
Xiao, Lin [1 ]
Zhang, Tong [2 ,3 ]
机构
[1] Microsoft Res, Machine Learning Grp, Redmond, WA 98052 USA
[2] Rutgers State Univ, Dept Stat, Piscataway, NJ 08854 USA
[3] Baidu Inc, Beijing 100085, Peoples R China
基金
美国国家科学基金会;
关键词
stochastic gradient method; proximal mapping; variance reduction;
D O I
10.1137/140961791
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical risk minimization. We propose and analyze a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastic gradient. While each iteration of this algorithm has similar cost as the classical stochastic gradient method (or incremental gradient method), we show that the expected objective value converges to the optimum at a geometric rate. The overall complexity of this method is much lower than both the proximal full gradient method and the standard proximal stochastic gradient method.
引用
收藏
页码:2057 / 2075
页数:19
相关论文
共 35 条
[1]  
[Anonymous], 2009, Advances in Neural Information Processing Systems
[2]  
[Anonymous], LIDSP2848 MIT
[3]  
[Anonymous], UCI MACHINE LEARNING
[4]  
[Anonymous], 2011, LIBSVM data: Classification, regression, andmulti-label
[5]  
[Anonymous], 2013, Introductory lectures on convex optimization: A basic course
[6]  
[Anonymous], 2014, ARXIV14012753
[7]  
[Anonymous], SIDO PHARM DATASET
[8]  
[Anonymous], 2013, ARXIV PREPRINT ARXIV
[9]  
[Anonymous], ARXIV12112772
[10]  
[Anonymous], 2013, 00860051 HAL INRIA