[Tile-serving] Performance of COORDS at updates
Paul Norman
penorman at mac.com
Wed Mar 18 08:24:14 UTC 2015
On 3/18/2015 12:56 AM, Robert Buchholz wrote:
> (I was using this mailing list in digest mode, and couldn't find a way
> to directly reply to the messages of Paul and Lynn in that mode).
>
> As Lynn pointed out, updates themselves are not yet implemented in
> COORDS. However, all data structures to support updates are in place
> (flat files of all node, way and relation data, indexed by their
> respective entity id). The six hours for data import already include
> writing out these files (about 191GB for a planet dump) as well as
> creating the actual geometry tiles.
What data structure is used to do a lookup of ways that reference a
particular node?
For those following along but not deeply involved in writing converters
for whole-planet scale OSM data, the increased size and decreased speed
of an import that can be updated is not caused by needing to find the
properties of an object by ID, but finding the parent ways of a node for
when the node has moved, or a comparable question with relations.
This is solved a few ways.
pgsnapshot and apidb have a way_nodes table which way id, node id and
position in way. Indexes allow lookups to be done by node id or way id.
The disadvantages of this method stem from the size of the table needed.
osm2pgsql stores an array of nodes with each way and does an array
overlap query (&&) which uses a GIN index built on the nodes column. The
disadvantage of this is that GIN indexes are comparatively slow to build
and rely on random IO when building. On a machine with a particularly
fast CPU and sequential disk speed and a slow random disk speed,
building the GIN index takes the majority of the import time. This can
be avoided by --slim --drop, which does not build the index,
substantially reducing the import time.
For assorted reasons, the separate table and binary tree index method is
faster, but this turns out not to be particularly important with osm2pgsql.
With both of these you also have to update the data structure (index) as
data changes, leading to bloat. GIN indexes are significantly worse for
bloating than binary tree indexes.
More information about the Tile-serving
mailing list