Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

被引：16

作者：

Peter Richtárik

Martin Takáč

机构：

[1] University of Edinburgh,School of Mathematics

来源：

Mathematical Programming | 2014年 / 144卷

关键词：

Block coordinate descent; Huge-scale optimization; Composite minimization; Iteration complexity; Convex optimization; LASSO; Sparse regression; Gradient descent ; Coordinate relaxation; Gauss–Seidel method; 65K05; 90C05; 90C06; 90C25;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-accurate solution with probability at least \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-\rho $$\end{document} in at most \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O((n/\varepsilon ) \log (1/\rho ))$$\end{document} iterations, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n$$\end{document} is the number of blocks. This extends recent results of Nesterov (SIAM J Optim 22(2): 341–362, 2012), which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving first true iteration complexity bounds. For strongly convex functions the method converges linearly. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{document}-regularized least squares problems with a billion variables.

引用

页码：1 / 38

页数：37

共 36 条

[1]

Canutescu AA(2003)Cyclic coordinate descent: a robotics algorithm for protein loop closure Protein Sci. 12 963-972

[2]

Dunbrack RL(2008)Coordinate descent method for large-scale J. Mach. Learn. Res. 9 1369-1398

[3]

Chang K-W(2010)-loss linear support vector machines Math. Oper. Res 35 641-654

[4]

Hsieh C-J(2009)Randomized methods for linear constraints: convergence rates and conditioning Inverse Probl. Imaging 3 487-503

[5]

Lin C-J(2008)Coordinate descent optimization for J. R. Stat. Soc. B 70 53-71

[6]

Leventhal D(2012) minimization with application to compressed sensing: a greedy algorithm SIAM J. Optim. 22 341-362

[7]

Lewis AS(2009)The group lasso for logistic regression J. Fourier Anal. Appl. 15 262-278

[8]

Li Y(1996)Efficiency of coordinate descent methods on huge-scale optimization problems J. R. Stat. Soc. B 58 268-288

[9]

Osher S(2001)A randomized kaczmarz algorithm with exponential convergence J. Optim. Theory Appl. 109 475-494

[10]

Meier L(2009)Regression shrinkage and selection via the lasso J. Optim. Theory Appl. 140 513-535

← 1 2 3 4 →