High Performance Big Data Clustering

被引:4
作者
Agrawal, Ankit [1 ]
Patwary, Md. Mostofa Ali [1 ]
Hendrix, William [1 ]
Liao, Wei-keng [1 ]
Choudhary, Alok [1 ]
机构
[1] Northwestern Univ, Dept EECS, Evanston, IL 60208 USA
来源
CLOUD COMPUTING AND BIG DATA | 2013年 / 23卷
关键词
big data; clustering; density-based clustering; hierarchical clustering; DBSCAN ALGORITHM; PARALLEL;
D O I
10.3233/978-1-61499-322-3-192
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.
引用
收藏
页码:192 / 211
页数:20
相关论文
共 63 条
  • [1] Parallel pairwise statistical significance estimation of local sequence alignment using Message Passing Interface library
    Agrawal, Ankit
    Misra, Sanchit
    Honbo, Daniel
    Choudhary, Alok
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (17) : 2269 - 2279
  • [2] Agrawal R., 1994, Quest Synthetic Data Generator
  • [3] [Anonymous], 2005, PARALLEL K MEANS DAT
  • [4] [Anonymous], 2006, CLUTO CLUSTERING HIG
  • [5] Arlia D., 2001, Euro-Par 2001 Parallel Processing. 7th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2150), P326
  • [6] BECKMANN N, 1990, SIGMOD REC, V19, P322, DOI 10.1145/93605.98741
  • [7] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [8] The recycling of gas and metals in galaxy formation: predictions of a dynamical feedback model
    Bertone, Serena
    De Lucia, Gabriella
    Thomas, Peter A.
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2007, 379 (03) : 1143 - 1154
  • [9] ST-DBSCAN: An algorithm for clustering spatial-temp oral data
    Birant, Derya
    Kut, Alp
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) : 208 - 221
  • [10] Breaking the hierarchy of galaxy formation
    Bower, R. G.
    Benson, A. J.
    Malbon, R.
    Helly, J. C.
    Frenk, C. S.
    Baugh, C. M.
    Cole, S.
    Lacey, C. G.
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2006, 370 (02) : 645 - 655