[OSM-dev] Experimental history files too

Tue Jan 28 02:18:13 UTC 2014

On Fri, Jan 24, 2014 at 2:56 PM, Peter Körner <osm-lists at mazdermind.de> wrote:
> Am 23.01.2014 18:28, schrieb Matt Amos:
>> i encourage everyone to take a look and report back any problems you
>> find. my thanks to Peter Körner, who seems to already be doing this -
>> with no problems?
>
> No problems yet. Have run two splits of those files already
> (http://osm.personalwerk.de/full-history-extracts/)
>
> What do you think about adopting the osmium-naming-scheme for history files?
>  .osm.[bz2|gz|pbf] -> regular osm files
>  .osh.[bz2|gz|pbf] -> history files
>  .osc.[bz2|gz]-> changeset-files
>
> That would make detecting the kind of file at first glance more easy and
> it also fits nicely int othe .osc-file nameing convention.

personally, i think it's misleading. osmchange is a related, but
different, format from osm xml and a parser which works for one will
not necessarily work for the other. therefore, having a different
extension seems reasonable.

in the case of .osm files, they're all potentially history files, and
the file format does not change depending on whether multiple versions
are present for a single ID or not. whether something is a "history"
osm file or a "current" osm file is a matter of the content - so
wanting a different extension is a bit like wanting .png for
truecolour images and .pgr for greyscale images (in the same PNG
format). having said that, it would seem reasonable to add a flag to
the document element to indicate whether the .osm file is a special
case, having a single version for each ID, as many programs seem to
rely on this assumption and it would be better to be able to check it.

> I'm going to implement a regular run that generates fresh extracts every
> week from the available file. Is there any note on which weekday the
> full-history-dumps are generated, so I can loosely sync my split-script
> to that rhythm?

great! :-)

the generation is synced to the backup database dumps, so the clock
starts running early Tuesday, when Monday's backup is complete. they
seem to be fairly reliably finished by Wednesday morning, so it's
probably safe to start looking for them then - although they'll be
named for Monday's date.

> I xml-writing takes only half as long as xml-reading I'd double-think
> about supplying xml-based files. nobody really has fun reading such huge
> files with expat. And if it's really neccessary, there's always
> osmium_convert which will generate xmls from pbf-dumps or -extracts locally.

this is a discussion which could probably continue forever. my opinion
is that it's worthwhile distributing files which are sort-of
human-readable, in a well-known format/markup for which many libraries
exist in many languages, compressed with standard tools, and in the
same format as the API.

this way, it's possible for people to develop tools which work against
small map call downloads, then scale them to extracts and even the
whole planet. of course, it's a widely-held belief that xml sucks
irretrievably and, while it's certainly true that pbf is smaller and
parses faster, distributing only pbf would mean someone would have to
learn those extra tools/commands to start using the data.

xml, despite its many flaws, at least has myriad libraries, bindings
and tools which make it easier to experiment with processing and
transforming osm data. these experimental planet/history files are
also line-oriented, which means one can even do quick-and-dirty
grep/sed/awk work for ad-hoc analysis.

cheers,

matt