[OSM-dev] Release candidate for OSM binary format is in osmosis trunk.

Mon Sep 6 13:47:44 BST 2010

On Mon, Sep 6, 2010 at 7:02 AM, Brett Henderson <brett at bretth.com> wrote:
>> Making that information usable is a separate challenge as that state
>> information has to be pushed through the osmosis pipe both to and from
>> the format. I'd have to ask Brett to chime on as to how to do this, my
>> guess is as part of a Bounds object. I'm assuming that osmosis is used
>> to generate these dumps, if not, then the program generating the dump
>> has to be modified to generate a binary format.
>
> I don't have an answer on how to do this.  I've so far avoided adding
> anything like this to the pipeline because it is very difficult to make all
> tasks support it in a meaningful way.  The existing bound support is messy
> enough and I question its usefulness.  Bound support makes sense for an
> editor like JOSM, but much less sense for Osmosis.  JOSM owns the entire
> lifecycle of an OSM file, only supports the one file format, and typically
> receives all data from the API.  Osmosis is more generic, may receive files
> from a number of sources, and this makes it much harder to preserve and
> manipulate metadata.
>
> Adding this type of info would probably require a re-think of how metadata
> is passed through the pipeline.  One reason why Osmosis is so flexible is
> that the data model it supports is very simple.  Adding extra data to
> support specific use cases will make this more difficult to achieve.
>
> I'm not sure that is makes much sense to add replication state support to
> binary files without adding it to other storage formats as well (XML, pgsql,
> apidb?).
>
> That all sounds a bit negative which isn't really my intent.  I guess what
> I'm saying is that I'm not keen to see replication support specifically
> added to the pipeline, but rather a rethink of how metadata can be passed
> through the pipeline in a more generic fashion.  It needs to take into
> account all tasks and not just the couple that are used for one use case.
> This could potentially be used for replication data, bounds information, and
> maybe even other information such as whether the data has been manipulated
> in some fashion (eg. clipIncompleteEntities option on the bounding box
> task).

There's not a single thing here that I disagree with.

However, I think metadata can be extremely useful. It can make
replication support much more convenient, or indicate that a file is
already sorted. Metadata is also a problem that extends across file
formats, which means that I should punt on adding it to the binary
format until/unless there is a consensus as to what metadata belongs
in the system, when it can be added to all formats. I would be happy
to contribute to such a discussion.

For now, for simplicity, I'm going to revert to the same metadata as
the XML format. Just a BBox and source field. I'll make them both
optional, making it easier to upgrade metadata features in the future.
When/if there is a consensus for additional metadata fields, support
for them can be added then.

I'll be releasing an rc2 at some point.

Scott