Distributed data mining in grid computing environments

被引:17
作者
Luo, Ping
Lu, Kevin [1 ]
Shi, Zhongzhi
He, Qing
机构
[1] Brunel Univ, Uxbridge UB8 3PH, Middx, England
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100080, Peoples R China
[3] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2007年 / 23卷 / 01期
基金
中国国家自然科学基金;
关键词
distributed data mining; directed acyclic graph; InterGrid; IntraGrid; multi-agent system environment;
D O I
10.1016/j.future.2006.04.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The computing-intensive data mining for inherently Internet-wide distributed data, referred to as Distributed Data Mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM often shares the computing paradigm of local processing and global synthesizing. It involves every phase of Data Mining (DM) processes, which makes the workflow of DDM very complex and can be modelled only by a Directed Acyclic Graph (DAG) with multiple data entries. Motivated by the need for a practical solution of the Grid scheduling problem for the DDM workflow, this paper proposes a novel two-phase scheduling framework, including External Scheduling and Internal Scheduling, on a two-level Grid architecture (InterGrid, IntraGrid). Currently a DM IntraGrid, named DMGCE (Data Mining Grid Computing Environment), has been developed with a dynamic scheduling framework for competitive DAGs in a heterogeneous computing environment. This system is implemented in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems from oil well logging analysis are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:84 / 91
页数:8
相关论文
共 14 条
[1]   Web services composition for distributed data mining [J].
Ali, AS ;
Rana, OF ;
Taylor, IJ .
2005 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS, 2005, :11-18
[2]   Distributed data mining on grids: Services, tools, and applications [J].
Cannataro, M ;
Congiusta, A ;
Pugliese, A ;
Talia, D ;
Trunfio, P .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (06) :2451-2465
[3]   Distributed data mining on the grid [J].
Cannataro, M ;
Talia, D ;
Trunfio, P .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2002, 18 (08) :1101-1112
[4]  
CHEN H, 2002, P 11 IEEE HET COMP W
[5]   ALLOCATING MODULES TO PROCESSORS IN A DISTRIBUTED SYSTEM [J].
FERNANDEZBACA, D .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1989, 15 (11) :1427-1436
[6]  
FU Y, 2001, IEEE TCDP NEWSLETTER
[7]  
Iverson M., 1999, P 8 HET COMP WORKSH
[8]  
KRISHNASWAMY S, 2002, P 4 INT C ENT INF SY, P374
[9]   Static scheduling algorithms for allocating directed task graphs to multiprocessors [J].
Kwok, YK ;
Ahmad, I .
ACM COMPUTING SURVEYS, 1999, 31 (04) :406-471
[10]  
LUO P, 2006, SCHEDULING DATA MINI