Distributed optimization and statistical learning via the alternating direction method of multipliers

被引:11116
作者
Boyd S. [1 ]
Parikh N. [2 ]
Chu E. [1 ]
Peleato B. [1 ]
Eckstein J. [3 ]
机构
[1] Electrical Engineering Department, Stanford University, Stanford
[2] Computer Science Department, Stanford University, Stanford
[3] Management Science and Information Systems Department, RUTCOR, Rutgers University, Piscataway
来源
Foundations and Trends in Machine Learning | 2010年 / 3卷 / 01期
关键词
D O I
10.1561/2200000016
中图分类号
学科分类号
摘要
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas-Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for ℓ1 problems, proximal methods, and thers. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations. © 2011 S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein.
引用
收藏
页码:1 / 122
页数:121
相关论文
共 183 条
  • [1] Afonso M.V., Bioucas-Dias J.M., Figueiredo M.A.T., Fast image recovery using variable splitting and constrained optimization, IEEE Transactions on Image Processing, 19, 9, pp. 2345-2356, (2010)
  • [2] Afonso M.V., Bioucas-Dias J.M., Figueiredo M.A.T., An augmented lagrangian approach to the constrained optimization formulation of imaging inverse problems, IEEE Transactions on Image Processing, 20, pp. 681-695, (2011)
  • [3] Anderson E., Bai Z., Bischof C., Demmel J., Dongarra J., Croz J.D., Greenbaum A., Hammarling S., McKenney A., Sorenson D., LAPACK: A Portable Linear Algebra Library for High-performance Computers, (1990)
  • [4] Arrow K.J., Debreu G., Existence of an equilibrium for a competitive economy, Econometrica: Journal of the Econometric Society, 22, 3, pp. 265-290, (1954)
  • [5] Arrow K.J., Hurwicz L., Uzawa H., Studies in Linear and Nonlinear Programming, (1958)
  • [6] Arrow K.J., Solow R.M., Gradient methods for constrained maxima, with weakened assumptions, Studies in Linear and Nonlinear Programming, (1958)
  • [7] Banerjee O., El Ghaoui L., D'Aspremont A., Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data, Journal of Machine Learning Research, 9, pp. 485-516, (2008)
  • [8] Bartlett P.L., Jordan M.I., McAuliffe J.D., Convexity, classification, and risk bounds, Journal of the American Statistical Association, 101, 473, pp. 138-156, (2006)
  • [9] Bauschke H.H., Borwein J.M., Dykstra's alternating projection algorithm for two sets, Journal of Approximation Theory, 79, 3, pp. 418-443, (1994)
  • [10] Bauschke H.H., Borwein J.M., On projection algorithms for solving convex feasibility problems, SIAM Review, 38, 3, pp. 367-426, (1996)