A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers

被引:81
作者
Gu, Lin [1 ]
Zeng, Deze [3 ]
Guo, Song [2 ]
Xiang, Yong [4 ]
Hu, Jiankun [5 ]
机构
[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] Univ Aizu, Aizu Wakamatsu, Fukushima 9658580, Japan
[3] China Univ Geosci, Wuhan 430071, Hubei, Peoples R China
[4] Deakin Univ, Sch Informat Technol, Melbourne, Vic 3125, Australia
[5] Univ New S Wales, Sch Informat Technol & Engn, Canberra, ACT 2600, Australia
关键词
Big data; stream processing; network cost minimization; VM placement; geo-distributed data centers; VIRTUAL MACHINE PLACEMENT; EFFICIENCY;
D O I
10.1109/TC.2015.2417566
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.
引用
收藏
页码:19 / 29
页数:11
相关论文
共 28 条
[1]
Ajiro Yasuhiro, 2007, CMG'07 International Conference, P399
[2]
[Anonymous], 2010, INFOCOM, 2010 Proceedings IEEE, DOI 10.1109/INFCOM.2010.5461930
[3]
Ballani H., 2013, P 10 USENIX C NETW S, P171
[4]
Surviving Failures in Bandwidth-Constrained Datacenters [J].
Bodik, Peter ;
Menache, Ishai ;
Chowdhury, Mosharaf ;
Mani, Pradeepkumar ;
Maltz, David A. ;
Stoica, Ion .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2012, 42 (04) :431-442
[5]
Chen KY, 2013, IEEE ICC, P3498, DOI 10.1109/ICC.2013.6655092
[6]
Chen YY, 2011, IEEE INFOCOM SER, P1620, DOI 10.1109/INFCOM.2011.5934955
[7]
Cherniack M., 2003, CIDR, V3, P257
[8]
Chinoy B., 1992, GAA21029 SDSC
[9]
Cohen R, 2013, IEEE INFOCOM SER, P355
[10]
VMPlanner: Optimizing virtual machine placement and traffic flow routing to reduce network power costs in cloud data centers [J].
Fang, Weiwei ;
Liang, Xiangmin ;
Li, Shengxin ;
Chiaraviglio, Luca ;
Xiong, Naixue .
COMPUTER NETWORKS, 2013, 57 (01) :179-196