Last week we had the n-th GIS drama about how Mapbox Vector Tiles should be called. I’m actually thiking in creating the GIS version of rubydramas (it’s gone, looks like ruby community has moved to node.js, sorry, io.js). This post is not to talk about naming, we all know standards with company names in the description are never used, look at those ESRI shapefiles…
The objetive of vector tiles (whatever format you use) is to move the data closer to the rendering stage, so it’s projected, clipped, transformed, extra precision is removed, filtered using naive filters, encoded and finally compressed. You should take a look at this Michal Migurski’s blogpost or this talk from Dane Springmeyer in foss4g to understand what is the rationale behind them.
CartoDB is a platform that runs on top of postgis, it renders tiles fetching them directly from a spatial query so we pay the price for all the overhead of going to the database, fetching the data, preparing it for render and so on. That’s why at some point someone though about removing that dynamic part of the equation. But in the other hand we have all the power of postgres and postgis (you know, spatial indices, geometry functions and so on)
So would it possible to generate a easily vector tile from postgres? Of course is, you can do almost everything in postgres right now with an extension but I’m one of those persons that like to do obvious things. It sounds easy, basically we need:
So here it is, given a query that retusn a resultset with cartodb_id and the_geom_webmercator (a geometry column in 3857):
Notice this query does not manage buffer-size, overzooming and so on, that’s pretty easy add tho. Also there is a res/20
that needs an extra explanation. If we used half of the pixel for the snapping we’d soon realize that some polygons and lines are removed pretty soon so using that 20 fixes the thing. I have to say that value was calcualted by hand and there are not maths behind it, why spend hours thinking when with a simple binary search you can fix the thing… The geometry is also simplified after snapping (be sure you do after snapping, the simplify algorithm complexity is higher than the snapping)
does this work?
let’s try with an extract of OSM planet where the geometry is about 350Mb
the CartoDB Vector Tile (did I say I’m pretty good at naming?) is 44Mb (3.8M gzip compressed) so not that bad.
But we still didn’t do anything special with the geometry encoding, we are using WKB to store all the things. Remember that WKB uses 16 bytes per coordinate in a geometry. Mapbox vector tiles use varint encoding of delta values in order to make this smaller. I personally don’t like varint to encode numbers, It’s better to leave the compressor do its work and don’t try to be smart playing with bits. But ok, in postgis we have a way to delta-varint all the things, it’s called TWKB:
copy (select st_astwkbagg(geom, 0, id) from cdb_tile(0, 0, 0, 'select id as cartodb_id, the_geom as the_geom_webmercator from planet')) TO '/tmp/tile2.cvt';
The result is a 9.8Mb (1.8M gzip compressed) tile. Much better and took about the same time to encode it. This also works much better with polygon/lines tables than mixed types, specially when there are a lot of points like in this case.
There a lot of things left, for example, when to use clipping, snapping and simplification (sometimes it’s better to send every single geometry than cut), coincident points, attribute optimization (I didn’t talk about attributes here because with postgres is pretty clear how to do this)