[OSM-dev] Experimental history files too

Tue Jan 28 11:11:26 UTC 2014

Am 28.01.2014 03:18, schrieb Matt Amos:
> On Fri, Jan 24, 2014 at 2:56 PM, Peter Körner <osm-lists at mazdermind.de> wrote:
>> What do you think about adopting the osmium-naming-scheme for history files?
> 
> personally, i think it's misleading
> [...]
> in the case of .osm files, they're all potentially history files, and
> the file format does not change depending on whether multiple versions
> are present for a single ID or not.

History-Files carry an extra attribute (bool visible) that distinguishes
them from regular osm files.

Also it's not only about the parser. osm2pgsql's parser is absolutely
capable of parsing files with multiple versions of an object, but the
whole processing chain will crash with wired errors.

Same for osmosis. Its parser will work but only a small fraction of
tasks will.

So from a data-user point of view it does not matter if the format is
actually-quite-similar, as long as there are separate programs and tools
required to handle the two types of file, they are actually different.

I'd compare it more to tiff/aiff. While they are actually quite similar
from a file-format point of view, no one would argue that audio-files
and image-files should be handled as if there were no difference.

> whether something is a "history"
> osm file or a "current" osm file is a matter of the content - so
> wanting a different extension is a bit like wanting .png for
> truecolour images and .pgr for greyscale images (in the same PNG
> format).
The main difference here is that all applications capable of reading png
MUST be capable of reading pgr as well. That's not true for osm/osh
applications.

Also the tasks you can perform on both file formats are the same
(display, crop, combine, ...). This is also not true for osm/osh files.
One can import both into a database but the database-format required and
the actions possible on those databases are quite different.

> having said that, it would seem reasonable to add a flag to
> the document element to indicate whether the .osm file is a special
> case, having a single version for each ID, as many programs seem to
> rely on this assumption and it would be better to be able to check it.
Isn't that already implemented via the required_features header of
pbf-files?

See
<https://github.com/joto/osmium/blob/master/include/osmium/output/pbf.hpp#L880>
for a reference.

> the generation is synced to the backup database dumps, so the clock
> starts running early Tuesday, when Monday's backup is complete. they
> seem to be fairly reliably finished by Wednesday morning, so it's
> probably safe to start looking for them then - although they'll be
> named for Monday's date.

Thank you for that info, I'll see when I can setup my regular splitting
task and announce sepeately.

>> I xml-writing takes only half as long as xml-reading I'd double-think
>> about supplying xml-based files. nobody really has fun reading such huge
>> files with expat. And if it's really neccessary, there's always
>> osmium_convert which will generate xmls from pbf-dumps or -extracts locally.
> 
> this is a discussion which could probably continue forever. my opinion
> is that it's worthwhile distributing files which are sort-of
> human-readable, in a well-known format/markup for which many libraries
> exist in many languages, compressed with standard tools, and in the
> same format as the API.
got your point.

Peter