The ganglia distributed monitoring system: design, implementation, and experience

被引:776
作者
Massie, ML
Chun, BN
Culler, DE
机构
[1] Intel Res Berkeley, Berkeley, CA 94704 USA
[2] Univ Calif Berkeley, Comp Sci Div, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
monitoring; clusters; distributed systems;
D O I
10.1016/j.parco.2004.04.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 [计算机软件与理论];
摘要
Ganglia is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. This paper presents the design, implementation, and evaluation of Ganglia along with experience gained through real world deployments on systems of widely varying scale, configurations, and target application domains over the last two and a half years. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:817 / 840
页数:24
相关论文
共 32 条
[1]
AMIR E, 1998, P ACM SIGCOMM SEPT, P178
[2]
ANDERSON E, 1997, P 11 SYST ADM C OCT
[3]
ANDERSON T, 1995, IEEE MICRO FEB
[4]
[Anonymous], P 38 IEEE COMP SOC I
[5]
BECKER DJ, 1995, P 9 INT C PAR PROC A
[6]
Bindel D., 2000, P 9 INT C ARCH SUPP
[7]
BODEN NJ, 1995, IEEE MICRO FEB
[8]
BREWER E, 2001, IEEE INTERNET COMPUT, V5
[9]
The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[10]
Buyya R, 2000, SOFTWARE PRACT EXPER, V30, P723, DOI 10.1002/(SICI)1097-024X(200006)30:7<723::AID-SPE314>3.0.CO