Efficient Machine Learning for Big Data: A Review

被引:416
作者
Al-Jarrah, Omar Y. [1 ]
Yoo, Paul D. [2 ]
Muhaidat, Sami [3 ]
Karagiannidis, George K. [1 ,4 ]
Taha, Kamal [1 ]
机构
[1] Khalifa Univ, Abu Dhabi, U Arab Emirates
[2] Bournemouth Univ, Data Sci Inst, Poole BH12 5BB, Dorset, England
[3] Univ Surrey, Guildford GU2 5XH, Surrey, England
[4] Aristotle Univ Thessaloniki, GR-54006 Thessaloniki, Greece
关键词
Big data; Green computing; Efficient machine learning; Computational modeling;
D O I
10.1016/j.bdr.2015.04.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
With the emerging technologies and all associated devices, it is predicted that massive amount of data will be created in the next few years - in fact, as much as 90% of current data were created in the last couple of years - a trend that will continue for the foreseeable future. Sustainable computing studies the process by which computer engineer/scientist designs computers and associated subsystems efficiently and effectively with minimal impact on the environment. However, current intelligent machine-learning systems are performance driven - the focus is on the predictive/classification accuracy, based on known properties learned from the training samples. For instance, most machine-learning-based nonparametric models are known to require high computational cost in order to find the global optima. With the learning task in a large dataset, the number of hidden nodes within the network will therefore increase significantly, which eventually leads to an exponential rise in computational complexity. This paper thus reviews the theoretical and experimental data-modeling literature, in large-scale data- intensive fields, relating to: (1) model efficiency, including computational requirements in learning, and data-intensive areas' structure and design, and introduces (2) new algorithmic approaches with the least memory requirements and processing to minimize computational cost, while maintaining/improving its predictive/classification accuracy and stability. (C) 2015 Elsevier Inc. Allrightsreserved.
引用
收藏
页码:87 / 93
页数:7
相关论文
共 54 条
[1]
[Anonymous], 2007, NUCL CARD MARK
[2]
Baldominos A, 2014, 2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIG DATA (CIBD), P112
[3]
Baluja T., ELECT PATIENT RECORD
[4]
Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[5]
Big Data Stream Learning with SAMOA [J].
Bifet, Albert ;
De Francisci Morales, Gianmarco .
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, :1199-1202
[6]
Blais J.A.R., 1988, ESTIMATION SPECTRAL
[7]
LOCAL LEARNING ALGORITHMS [J].
BOTTOU, L ;
VAPNIK, V .
NEURAL COMPUTATION, 1992, 4 (06) :888-900
[8]
Data-intensive applications, challenges, techniques and technologies: A survey on Big Data [J].
Chen, C. L. Philip ;
Zhang, Chun-Yang .
INFORMATION SCIENCES, 2014, 275 :314-347
[9]
Big Data Deep Learning: Challenges and Perspectives [J].
Chen, Xue-Wen ;
Lin, Xiaotong .
IEEE ACCESS, 2014, 2 :514-525
[10]
Efficient Algorithm for Localized Support Vector Machine [J].
Cheng, Haibin ;
Tan, Pang-Ning ;
Jin, Rong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (04) :537-549