FOSS4g Belgium 2019
Yesterday I went to FOSS4g.be in Brussels. Many interesting talks and people! A big thanks to the organizers.
I presented our new Python package dask-geomodeling which we at Nelen & Schuurmans worked on over the past years. This package is the engine of our GIS modeling platform Lizard Geoblocks. We open sourced it this summer and aim to give it some broader use base.
In short, dask-geomodeling is a Python framework for GIS analytics. You define a view onto your raster- or vectordata, formulate which geometric extent you want to compute, and submit that computation to dask. Dask can compute on your local machine using multithreading or multiprocessing, or it can compute on a distributed system of computers.
During the talk, some interesting questions were raised:
Why not use dask + xarray to parallelize operations over large datasets?
dask-geomodeling lazily evaluates the compute graph, while dask + xarray builds the complete compute for all tiles in memory. We often want very small tiles from very big datasets: the number of tiles easily goes into the millions, in which the dask + xarray approach was just not performant enough.
Is it possible to compute let’s say a 0.5m resolution map of all The Netherlands?
Yes, but presently you will need to write some stuff yourself (example). It basically depends how you want to output your data. For example, you could write a
WriteToGeoTIFF building block that writes the result of your computation to single file. Then you just add that at the end of your view definition and compute tile by tile. Possibly, you can also write a
TileRequest building block that tiles the requested area for you, so that you can more easily parallelize.
We are working on this as well and aim to provide examples of this use case.
How does dask-geomodeling’s performance compare with other GIS tools such as GRASS?
We don’t know yet! We just started to use dask-geomodeling for batch computations. dask-geomodeling relies mostly on other libraries (GDAL, numpy, scipy, geopandas), so this question may propagate to those libraries as well.
OGC RESTful standards
I was excited to hear that OGC is currently working on OGC API: a RESTful OpenAPI specification for features, coverages, maptiles, and more. As we at Nelen & Schuurmans are continuously working on RESTful APIs of geospatial data, I will start following this more closely.
Johan van de Wauw showed how to use SAGA and how many raster and vector processing utilities there are. I saw there is also a Python wrapper, I think it is really interesting to see if we can use it to expose utilities from SAGA through the numpy C-API, as we did in PyGEOS.
As I am working currently on a new timeseries storage backend (using TimescaleDB), the moving object PostGRES/PostGIS extension MobilityDB interested me a lot. MobilityDB defines some new PostGRES datatypes that instead of single Points, store timeseries of points. With this database structure one can process trajectories of ~1000 points much more performant than with standard PostGRES techiques.