PARALLEL PROCESSING OF BIG POINT CLOUDS USING Z-ORDER-BASED PARTITIONING

被引:3
作者
Ails, C. [1 ]
Boehm, J. [1 ]
Liu, K. [1 ]
机构
[1] UCL, Dept Civil Environm & Geomat Engn, Gower St, London, England
来源
XXIII ISPRS Congress, Commission II | 2016年 / 41卷 / B2期
关键词
Z-order; Spatial partitioning; Big Data; LiDAR; Point Cloud; Apache Spark; SPATIAL DATA;
D O I
10.5194/isprsarchives-XLI-B2-71-2016
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 01(2), y = 3 = 11(2)) is 1011(2) = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest neighbour algorithm for a hemispherical and a triangular wave point cloud.
引用
收藏
页码:71 / 77
页数:7
相关论文
共 10 条
[1]   Fast Construction of k-Nearest Neighbor Graphs for Point Clouds [J].
Connor, Michael ;
Kumar, Piyush .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2010, 16 (04) :599-608
[2]  
Eldawy A, 2015, PROC VLDB ENDOW, V8, P1602
[3]  
Lawder JK, 2000, LECT NOTES COMPUT SC, V1832, P20
[4]   Efficient Processing of k Nearest Neighbor Joins using MapReduce [J].
Lu, Wei ;
Shen, Yanyan ;
Chen, Su ;
Ooi, Beng Chin .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (10) :1016-1027
[5]  
Nishimura S, 2013, DISTRIB PARALLEL DAT, V31, P289, DOI 10.1007/s10619-012-7109-z
[6]   PROBE SPATIAL DATA MODELING AND QUERY-PROCESSING IN AN IMAGE DATABASE APPLICATION [J].
ORENSTEIN, JA ;
MANOLA, FA .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (05) :611-629
[7]  
Swart L. T., 2010, UP TO DATE HEIGHT MO
[8]   Indexing spatial data in cloud data managements [J].
Wei, Ling-Yin ;
Hsu, Ya-Ting ;
Peng, Wen-Chih ;
Lee, Wang-Chien .
PERVASIVE AND MOBILE COMPUTING, 2014, 15 :48-61
[9]  
You SM, 2015, 2015 13TH IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), P34, DOI 10.1109/ICDEW.2015.7129541
[10]  
Zhang Chi., 2012, PROC EDBT C, P38, DOI DOI 10.1145/2247596.2247602