[OSM-dev] keeping thematic planet extract up to date

Martijn van Exel m at rtijn.org
Tue Oct 18 18:51:54 BST 2011


Thanks for you elaborate explanation. I see the problem with the ways
potentially lacking geometries.
Would the problem be much simpler if I constrain the data set to nodes only?
Why are you using osm2pgsql for the database update step? Is that because
you require that schema or are there other reasons?

On Tue, Oct 18, 2011 at 11:06 AM, Igor Podolskiy <
igor.podolskiy at vwi-stuttgart.de> wrote:

> Hi Martijn,
>  this seems to run OK, but invariably, after leting this run for a few
>> hours with a 5 minute interval (to catch up, my initial extract is a
>> couple of months old) the database table only holds a small number (less
>> than 20) nodes. What is going wrong here?
> well, sorry to say that, but it has multiple problems :)
> 1. Your filter (--tf accept-ways x=* --tf --accept-nodes x=*) doesn't do
> what you want, because it filters out _all_ nodes that aren't tagged with
> gnis:id=*, including those that constitute the gnis:id=* ways. So you end up
> with a bunch of ways in your stream with empty geometries. This is probably
> the main reason you see only a small number of nodes in the DB, because
> there's nothing else --wp oder --wpc can write to the DB. See my earlier
> post [1] about how to do tag-based filtering with osmosis :)
> 2. In the long run, you'll get wrong data if you only store the filtered
> data. Consider this scenario:
> T+0: you get your initial extract, way 12345 has no gnis:id, gets filtered
> out and is not stored in the DB
> T+1: somebody sets gnis:id=foo on way 12345
> T+2: you get a change stream from replication which says: "Update way 12345
> with these tags" and you have no way to update. From your point of view,
> this "update" is a "create" - but nobody but you knows that. Worse even, you
> have no nodes for this way because they got filtered out at T+0 and are not
> included in the change stream. No nodes -> no geometry, even if you manage
> to sneak that way object into your DB somehow.
> I maintain some thematic extracts for my work myself. Here's what I do:
> -------
> #!/bin/bash
> # archive the last known good version
> mv germany-railways.osm.pbf germany-railways.osm.pbf.1
> # replicate the full extract, calls osmosis --rri
> $HOME/scripts/get-changes.sh germany-boxed.osm.pbf state
> # "thematic filtering", calls Osmosis to filter out railways
> $HOME/scripts/filter-railways.**sh germany-boxed.osm.pbf
> germany-railways.osm.pbf
> # derive change for the railways
> osmosis --rb germany-railways.osm.pbf --sort --rb
> germany-railways.osm.pbf.1 --sort --derive-change bufferCapacity=10000 --lpc
> --wxc railways.osc
> # update DB (this is the osm2pgsql equivalent to --wpc)
> osm2pgsql -U podolsir -d gis --prefix osm_railways -a -m -s -S
> $HOME/scripts/railways.style railways.osc
> ------
> Basically, this way you keep your replication targets compatible with the
> respective replication sources (more or less, a bbox-based extract is not
> fully water-proof either, but it works for a reasonably generous bbox).
> Based on that, you do your tag based filtering and derive a change which has
> the right "updates" and "creates".
> Yes, this _is_ much slower than the "intuititve" way (I started out with
> that, too :)), because you need to process _all_ data you have in
> --apply-change this way.
> You could try to keep everything in your PostGIS database and then just
> SELECT the stuff that has "gnis:id" for actual processing. However I don't
> know what that means in performance terms, as I didn't use that kind of
> databases yet on any scale worth mentioning. My guess would be that
> --apply-change gets faster but you'll need much more disk space.
> In any case: if you replicate, you need a source and a target that are
> compatible. Since your replication source is the planet, ideally you should
> have a complete planet as the target. Large geographic extracts work more or
> less, tag-based extracts almost never work as replication targets.
> Hope that helps
> Igor
> [1] http://lists.openstreetmap.**org/pipermail/dev/2011-April/**
> 022394.html<http://lists.openstreetmap.org/pipermail/dev/2011-April/022394.html>

martijn van exel
geospatial omnivore
1109 1st ave #2
salt lake city, ut 84103
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20111018/c4159db0/attachment.html>

More information about the dev mailing list