Mark Granger grangerfx at gmail.com
Tue Sep 1 22:37:13 BST 2009

I wish to request a change in the way the weekly planet files get generated. The planet files are supposed to contain a weekly snapshot of the OSM database. They are generated during a period of time during which users are able to make changes to the OSM database. This leaves the weekly planet files in an incomplete state. I ran into this when I wrote my own planet file parser and found that a few hundred or thousand ways in each file referenced non-existant nodes. There are probably also some nodes that have bad locations and some ways that reference the wrong nodes but I am not able to detect these issues with my parser.

I asked about this problem on the forums and was told the following by Ldp:

"This is expected behaviour from the weekly planet dump. There is no guarantee of integrity on any data that was added to the OSM db after the time the planet file generation has started.

It takes a few hours to generate the planet file, and when the nodes have been dumped to the file, someone can still add a new way and associated new nodes. This could then show up only as a way, later in the file.

The only way to get a good file with referential integrity is to take the weekly planet dump, and add the next day's daily diff file to that:

1) Fetch the weekly planet (e.g. planet-090708.osm.bz2) and next day's diff (e.g. 20090708-20090709.osc.gz)

2) bzcat planet-090708.osm.bz2 | osmosis --rxc 20090708-20090709.osc.gz --rx - --ac --wx planet-090709.osm.gz"

This solution would probably work for me although I think it would be faster for me to just read the daily diff files myself and apply them while parsing the planet file.

A better solution would be for the planet file creation process to wait a day and automatically apply the next day's diff file to the database before uploading it. The only downside I can see to this change is that we will have to wait one more day for the release of the planet files. Considering that this will make the planet files 100% complete, I think that this is a worthwhile tradeoff. I am assuming that the daily diff file contains an instantaneous snapshot of the entire OSM database and does not contain partial edits like the planetary files.

One problem that this would solve is that there is currently no way (that I know of) to reliably rebuild the older weekly planet files such that they are complete. This is because the daily diff files are only archived for about two weeks while the planet files are archived for years. It may be possible to merge two or more planet files to yield one complete planet file but their is no guarantee that this will work 100% of the time since the same nodes or ways could have been edited two weeks in a row during the time when the planet file is being regenerated. I presume that the weekly planet files are being archived for some reason. It would be nice if they were complete snapshots.

-Mark Granger
