[OSM-dev] Osmosis Replication Statistics
Brett Henderson
brett at bretth.com
Wed Aug 22 23:46:47 BST 2007
Martijn van Oosterhout wrote:
> Can you not diff the most recent planet against your generated
> version, that should at least tell you where to look. Remember, even
> simple things like using <tag></tag> instead of <tag/> can blow up the
> file incredibly.
>
Yes, I can do that. Although it sounds like the TIGER deletes could be
the culprit.
Identifying the offending data is slightly tricky due to the fact that a
planet isn't a consistent snapshot so it's difficult to compare apples
and apples. But it should be possible using the following process.
1. Generate a snapshot planet using osmosis as of 20070701 (the date
shouldn't be terribly important but should match an existing planet dump).
2. Obtain an equivalent planet.
3. Extract all entities from the osmosis snapshot that haven't been
updated since a much earlier date (say 20070101).
4. Cross check this entity list against the entities in the planet.
5. All entities existing in the osmosis list that don't exist in the
planet should have delete records added to the history tables.
Does this sound feasible? Are there other verification steps that could
be performed such as checking for TIGER tags? Was *all* TIGER data
deleted from the database or only some? If it's all TIGER data then it
may be much simpler to fix the problem by simply adding a delete record
for every TIGER entity in the database.
>
>> Problem 2
>> I've examined a random sample of the changes between my two 20070101.osm
>> files. For each change I examined the history of the entity in
>> question. In every case I've checked the change can be explained by the
>> fact that the two most recent history rows (as of beginning 2007) have
>> identical timestamps. This means my queries sometimes return one row,
>> sometimes the other depending on the particular query characteristics.
>> I don't think there's much I can do about this. Given that it is a very
>> small set of changes, it is probably something we can live with and fix
>> on a case by case basis as problems are picked up.
>>
>
> Got some examples?
>
The changes between the two 20070101.osm files was attached to my
initial post which probably wasn't clear. The changes are in the
direction of snapshot->derived.
More information about the dev
mailing list