[OSM-dev] keeping thematic planet extract up to date
Igor Podolskiy
igor.podolskiy at vwi-stuttgart.de
Tue Oct 18 18:06:43 BST 2011
Hi Martijn,
> this seems to run OK, but invariably, after leting this run for a few
> hours with a 5 minute interval (to catch up, my initial extract is a
> couple of months old) the database table only holds a small number (less
> than 20) nodes. What is going wrong here?
well, sorry to say that, but it has multiple problems :)
1. Your filter (--tf accept-ways x=* --tf --accept-nodes x=*) doesn't do
what you want, because it filters out _all_ nodes that aren't tagged
with gnis:id=*, including those that constitute the gnis:id=* ways. So
you end up with a bunch of ways in your stream with empty geometries.
This is probably the main reason you see only a small number of nodes in
the DB, because there's nothing else --wp oder --wpc can write to the
DB. See my earlier post [1] about how to do tag-based filtering with
osmosis :)
2. In the long run, you'll get wrong data if you only store the filtered
data. Consider this scenario:
T+0: you get your initial extract, way 12345 has no gnis:id, gets
filtered out and is not stored in the DB
T+1: somebody sets gnis:id=foo on way 12345
T+2: you get a change stream from replication which says: "Update way
12345 with these tags" and you have no way to update. From your point of
view, this "update" is a "create" - but nobody but you knows that. Worse
even, you have no nodes for this way because they got filtered out at
T+0 and are not included in the change stream. No nodes -> no geometry,
even if you manage to sneak that way object into your DB somehow.
I maintain some thematic extracts for my work myself. Here's what I do:
-------
#!/bin/bash
# archive the last known good version
mv germany-railways.osm.pbf germany-railways.osm.pbf.1
# replicate the full extract, calls osmosis --rri
$HOME/scripts/get-changes.sh germany-boxed.osm.pbf state
# "thematic filtering", calls Osmosis to filter out railways
$HOME/scripts/filter-railways.sh germany-boxed.osm.pbf
germany-railways.osm.pbf
# derive change for the railways
osmosis --rb germany-railways.osm.pbf --sort --rb
germany-railways.osm.pbf.1 --sort --derive-change bufferCapacity=10000
--lpc --wxc railways.osc
# update DB (this is the osm2pgsql equivalent to --wpc)
osm2pgsql -U podolsir -d gis --prefix osm_railways -a -m -s -S
$HOME/scripts/railways.style railways.osc
------
Basically, this way you keep your replication targets compatible with
the respective replication sources (more or less, a bbox-based extract
is not fully water-proof either, but it works for a reasonably generous
bbox). Based on that, you do your tag based filtering and derive a
change which has the right "updates" and "creates".
Yes, this _is_ much slower than the "intuititve" way (I started out with
that, too :)), because you need to process _all_ data you have in
--apply-change this way.
You could try to keep everything in your PostGIS database and then just
SELECT the stuff that has "gnis:id" for actual processing. However I
don't know what that means in performance terms, as I didn't use that
kind of databases yet on any scale worth mentioning. My guess would be
that --apply-change gets faster but you'll need much more disk space.
In any case: if you replicate, you need a source and a target that are
compatible. Since your replication source is the planet, ideally you
should have a complete planet as the target. Large geographic extracts
work more or less, tag-based extracts almost never work as replication
targets.
Hope that helps
Igor
[1] http://lists.openstreetmap.org/pipermail/dev/2011-April/022394.html
More information about the dev
mailing list