Trends in big data analytics

被引:442
作者
Kambatla, Karthik [1 ]
Kollias, Giorgos [2 ]
Kumar, Vipin [3 ]
Grama, Ananth [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[3] Univ Minnesota, Dept Comp Sci, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
Big-data; Analytics; Data centers; Distributed systems; MODEL;
D O I
10.1016/j.jpdc.2014.01.003
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
One of the major applications of future generation parallel and distributed systems is in big-data analytics. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size. Beyond their sheer magnitude, these datasets and associated applications' considerations pose significant challenges for method and software development. Datasets are often distributed and their size and privacy considerations warrant distributed techniques. Data often resides on platforms with widely varying computational and network capabilities. Considerations of fault-tolerance, security, and access control are critical in many applications (Dean and Ghemawat, 2004; Apache hadoop). Analysis tasks often have hard deadlines, and data quality is a major concern in yet other applications. For most emerging applications, data-driven models and methods, capable of operating at scale, are as-yet unknown. Even when known methods can be scaled, validation of results is a major issue. Characteristics of hardware platforms and the software stack fundamentally impact data analytics. In this article, we provide an overview of the state-of-the-art and focus on emerging trends to highlight the hardware, software, and application landscape of big-data analytics. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:2561 / 2573
页数:13
相关论文
共 75 条
[41]   Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers [J].
Farrington, Nathan ;
Porter, George ;
Radhakrishnan, Sivasankar ;
Bazzaz, Hamid Hajabdolali ;
Subramanya, Vikram ;
Fainman, Yeshaiahu ;
Papen, George ;
Vahdat, Amin .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2010, 40 (04) :339-350
[42]   BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers [J].
Guo, Chuanxiong ;
Lu, Guohan ;
Li, Dan ;
Wu, Haitao ;
Zhang, Xuan ;
Shi, Yunfeng ;
Tian, Chen ;
Zhang, Yongguang ;
Lu, Songwu .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2009, 39 (04) :63-74
[43]   Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions [J].
Halko, N. ;
Martinsson, P. G. ;
Tropp, J. A. .
SIAM REVIEW, 2011, 53 (02) :217-288
[44]   TOWARD DARK SILICON IN SERVERS [J].
Hardavellas, Nikos ;
Ferdman, Michael ;
Falsafi, Babak ;
Ailamaki, Anastasia .
IEEE MICRO, 2011, 31 (04) :6-15
[45]   Rate-Based QoS Techniques for Cache/Memory in CMP Platforms [J].
Herdrich, Andrew ;
Illikkal, Ramesh ;
Iyer, Ravi ;
Newell, Don ;
Chadha, Vineet ;
Moses, Jaideep .
ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, :479-488
[46]  
Hölzle U, 2010, IEEE MICRO, V30, P20, DOI 10.1109/MM.2010.61
[47]  
Hunt P., 2010, P USENIX ANN TECH C, P11, DOI DOI 10.5555/1855840.1855851
[48]  
Isard M., 2007, EUROSYS
[49]  
Iyer Ravi, 2007, Intel Technology Journal, V11, P227, DOI 10.1535/itj.1103.06
[50]  
Kambatla Karthik, 2010, IEEE INT C CLUSTER C