Dublin point clouds on Dumbo cluster

extract point cloud data from the HBase database on NYU Dumbo cluster

June 29, 2018

The 2015 Dublin LiDAR point cloud and 2 older laser scanning datasets of Dublin have been ingested onto NYU Dumbo cluster. The data are accessible to anyone having access to the cluster. This post contains the code snippets you can use to extract point cloud subsets from the LiDAR database. This database is being developed as part of the Ariadne3D project. The related publication explaining the database design is available for download at the link below.



  • You need an NYU HPC account. See this instruction if you want to sign up for an account.

  • You need to be on the login-1-1 node to work with Dumbo’s HBase. You can ssh login-1-1 if you are assigned to login-2-1 at the first login.

  • Upload the jar file listed below (i.e. lidardb-0.0-jar-with-dependencies.jar) to a directory on Dumbo that you have access to (e.g. your home directory).


pcextract - extract a point cloud subset

Purpose: Clip a portion of the point cloud dataset using a 3D, recti-linear querying box.

java -jar [path_to_jar_file] pcextract -t [table_name] -output [path_to_output_file] -clipping_box [xmin ymin zmin xmax ymax zmax]

Example code snippets

Query by the original point coordinates (Irish TM75)

java -jar $HOME/lidardb-0.0-jar-with-dependencies.jar pcextract -t dublin15 -output $HOME/out.las -c 316000 234000 0 316010 234010 50

The above command extracts the points that have x bounded by [316000, 316010], and y bounded by [234000, 234010], and z bounded by [0 50] from table dublin15. The extracted points are saved into a file at $HOME/out.las. The exported data are in the LAS format. You can use this 3D viewer to locate the data regions you need and their coordinates. The point picking tool on the control pannel on the left side of the viewer will let you read the coordinates.

Currently, Dumbo’s HBase is loaded with 3 LiDAR scans of Dublin. The table name dublin15 in the command can be replaced by dublin14 or dublin07 if the older LiDAR scans of Dublin (acquired in 2014 and 2007) are needed.

Query by WGS84 coordinates (latitude, longitude)

If you know the lat/lon of the region you want to extract, you can use the syntax below with the -wgs84 tag.

java -jar $HOME/lidardb-0.0-jar-with-dependencies.jar pcextract -t dublin15 -output $HOME/out.las -c -6.258411705493927,53.340577164289094,-100,-6.257177889347076,53.34122894218849,500 -wgs84

The 6 parameters passing through tag -c are: lat_min, lon_min, z_min (meters above sea level), lat_max, lon_max, z_max.

Additional parameters

In addition to the above mandatory parameters, pcextract takes the following parameters:

tag description example
-m data storage model (see details in the paper) 4
-cache cache size 1000
-no_hilbert_ranges number of Hilbert ranges 100
-conn connection details if the database is not on the localhost or at the default port
-srid convert to WGS84 from a non-default SRID (EPSG:29903) 29903
-absolute_accuracy toggle between the approximate and the accurate mode (empty)
-debug log more information for debug purpose (empty)

