[OSM-dev] Expiring Tiled OSM Data

Paul Norman penorman at mac.com
Thu Jun 20 19:41:41 UTC 2013


> From: Andy Allan [mailto:gravitystorm at gmail.com]
> Sent: Thursday, June 20, 2013 5:14 AM
> Subject: Re: [OSM-dev] Expiring Tiled OSM Data
> 
> On 19 June 2013 20:58, Ian Dees <ian.dees at gmail.com> wrote:
> 
> > So we're talking potentially several billion (entity type)+(entity id)
> > -> [(tile x)+(tile y), ...] rows in a database of some sort along with
> > the forward and reverse indexes.
> 
> Indeed. Although there's probably a much clever approach than the one
> I'm suggesting.

Here's my thoughts. I'm ignoring relations because the same logic that works
for ways should work for them.

This also becomes simpler if clients can handle ways where not all of the
nodes referenced by the way are in the file.

For each tile store

- All node locations and tags for nodes in the tile

- All way tags and referenced node list

- All node locations and tags for nodes referenced by the ways above

For a request you could then assemble this information in a deterministic
manner into XML, and cache this XML. There's no reason you couldn't
serialize the XML in advance, but you might not want to because XML is so
slow to serialize or parse and serializing XML that gets invalidated before
its requested is a waste. You could exclusively use XML but then you'd have
to parse and serialize the tile for each change, which is expensive. PBF
might be better. 

You can then cache the resulting XML file and only invalidate it if
something in that tile changes. Tile changes brings us to the complicated
part.

To handle changes you're going to need a few indexes or lookup tables or
something. 

First is a node ID to tile number table. Second is a way ID to tile numbers
that include the way table. 

Node creates are simple, convert (lat,lon) to tile number, apply the node
addition to the tile, invalidate the XML representation of the tile.

Node deletes are more complicated. There is where the node ID to tile number
index is absolutely required as a delete alone doesn't tell you where you
were deleting from. Use the index to look up which tile the delete needs
applying to, then apply and invalidate.

Node modifies are both a delete and create conceptually for the purposes of
determining the tiles to apply the modify to because you don't know in
advance if the node was moved or not.

A way create involves a lookup of all the nodes to get a list of tiles,
storing that list of tiles applying the create to those tiles and
invalidating them.

A way delete involves a lookup in the table to get tile numbers to apply it
to, applying the delete to those tiles, invalidating those tiles and then
deleting the entry for that way in the lookup table.

A way modify is a lookup for old tile numbers, a calculation of new tile
numbers, then applying to modify to all of those and modifying the entry in
the lookup table.

Relations would mean you'd need another table for relation ID to tiles, and
any changes to relations involving ways would build a new entry by looking
at the ways to tiles table.
	
This involves two (+relations) tables but they would be smaller than a full
node ID to (lat,lon) or way ID to node ID table. 

A node ID -> (lat,lon) table takes 16 bytes/node (osm2pgsql flat nodes) and
about 16GB with current data. You could fit tile numbers into a 32-bit value
(4.2 billion tiles), which is 4 bytes/node or 4GB with current data, easily
fit in ram. Another advantage is you don't need to do lat,lon to node ID
lookups out of this table like you do with pgsnapshot.

A way ID -> ((lat,lon), ...) or way ID -> node ID table is very large.
Because ways will tend to have multiple nodes in the same tile there would
be less entries, and each entry would take less storage than a node ID or
(lat,lon). The size here would be hard to estimate because it depends on
tile size - at the two extremes it would take no space for a tile covering
the entire world or (4+8+8) bytes per node reference in a way (about 2.2
billion, total of 44GB + some kind of index) for very small tiles. For
32-bit tiles you'd need 8 bytes per entry, but a lot of ways are buildings
which should tend to lie in one tile and only have one entry for the way,
not one for each node reference. Given that 20% of the DB is French imported
buildings, my guess would be on the order of 4GB, but you'd need some kind
of index or clever storage method.





More information about the dev mailing list