[OSM-talk] osm2pgsql & planet: frustrations, cutoffs, and idempotence

Brett Henderson brett at bretth.com
Mon Oct 27 22:04:46 GMT 2008


On Tue, Oct 28, 2008 at 7:39 AM, Michal Migurski <mike at stamen.com> wrote:

> >> Finally, the boundaries between the hourlies and dailies seem
> >> misaligned.
> >>
> >
> > This shouldn't be the case.
> >> After running the remaining hourlies for the 22nd, I attempted to
> >> pick  up on the 23rd with a daily. The final hourly I used was
> >> 2008102223-2008102300.osc.gz. It's my expectation that I should be
> >> able to immediately follow that with 20081023-20081024.osc.gz, but
> >> this led to duplicate key violation suggesting that there's an
> >> overlap  between the two files. Continuing with hourlies *works*,
> >> but is  tedious and I suspect slower than the dailies.
> >>
> >
> > You should have been able to do what you've suggested.  If you are
> > finding problems, please provide me with some example data which is
> > misaligned between the two types of changesets.
>
> Try the two files mentioned above - that's where I saw this behavior,
> they're quite recent.
>
>        2008102223-2008102300.osc.gz
>        20081023-20081024.osc.gz


I need you to provide some specific examples of broken data.  If you can say
that "way 27123456 is created in both of the above files even though they
are for different time periods" then I can take a look at why this may have
occurred.  Just saying that there is misalignment between those two files
doesn't help me at all.  Presumably you ran into a specific problem and
received a specific error message, this is the kind of information I need.
I only do this project in my spare time and can't go looking for problems
that I'm not sure even exist, I have enough known problems to look into
already :-)


>
>
>
> >> My sense from reading other people's experiences has been that it's
> >> a  common pattern to rely solely on the weekly planet dumps,
> >> incurring  the substantial overhead of parsing and importing the
> >> full 5GB dump  once every week, and then re-rendering the complete
> >> set of tiles.
> >>
> >
> > For a long time weekly planet dumps were the only bulk data
> > available.  Osmosis changesets have been on the scene for some time
> > now though and are gradually being utilised by more and more
> > clients.  As the planet grows, this will become more critical.  Who
> > knows, if the kinks gradually get ironed out of the osm2pgsql
> > program we may even begin to see the main mapnik tile generator move
> > to using changesets.
>
> I would love to rely on these exclusively, it's much more efficient.
> But, I was seeing a fair bit of information fall through the cracks so
> that's why I'm re-synching to planet every four weeks.


Again, please provide some specific examples.  If data is being missed I'd
like to know about it.  Osmosis provides some tools that may be useful
here.  You can download a planet, apply changesets for a week, then compare
against the next planet and see what the differences are.  Obviously both
planets would need appropriate changesets applied to make them consistent
before performing a comparison to eliminate noise.

I probably should do some of these comparisons myself, but again just
haven't found time yet and nobody else has complained about missing data.
The minute changesets run 5 minutes behind the API so could potentially miss
data if a lock is held for several minutes.  The daily and hourly changesets
run at least 20 minutes behind API (forget off the top of my head) and
should be extremely unlikely to miss data.

Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20081028/5b4398b3/attachment.html>


More information about the talk mailing list