Federated Learning Over Wireless Fading Channels

被引:431
作者
Amiri, Mohammad Mohammadi [1 ,2 ]
Gunduz, Deniz [1 ]
机构
[1] Imperial Coll London, Dept Elect & Elect Engn, London SW7 2BU, England
[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
基金
欧洲研究理事会;
关键词
Wireless communication; Bandwidth; Fading channels; Channel estimation; Performance evaluation; Stochastic processes; Training; Approximate message passing (AMP); federated learning (FL); over-the-air computation; stochastic gradient descent (SGD);
D O I
10.1109/TWC.2020.2974748
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS). We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and propose various techniques to implement distributed stochastic gradient descent (DSGD) over this shared noisy wireless channel. We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error from previous iterations, and project the resultant sparse vector into a low-dimensional vector for bandwidth reduction. We also design a power allocation scheme to align the received gradient vectors at the PS in an efficient manner. Numerical results show that D-DSGD outperforms other digital approaches in the literature; however, in general the proposed CA-DSGD algorithm converges faster than the D-DSGD scheme, and reaches a higher level of accuracy. We have observed that the gap between the analog and digital schemes increases when the datasets of devices are not independent and identically distributed (i.i.d.). Furthermore, the performance of the CA-DSGD scheme is shown to be robust against imperfect channel state information (CSI) at the devices. Overall these results show clear advantages for the proposed analog over-the-air DSGD scheme, which suggests that learning and communication algorithms should be designed jointly to achieve the best end-to-end performance in machine learning applications at the wireless edge.
引用
收藏
页码:3546 / 3557
页数:12
相关论文
共 36 条
[1]  
Alistarh D., 2017, ADV NEURAL INF PROCE, V30, P1709
[2]  
[Anonymous], IEEE T SIGNAL PROCES
[3]  
[Anonymous], ARXIV170405021
[4]  
[Anonymous], ARXIV170507878
[5]  
[Anonymous], 2018, ARXIV171201887
[6]  
[Anonymous], 2018, ARXIV180208021
[7]  
[Anonymous], 2016, WORKSH PRIV MULT MAC
[8]  
[Anonymous], 2018, ARXIV180509965
[9]  
[Anonymous], 2016, ABS161107555 CORR
[10]  
[Anonymous], 2018, ARXIV181211494