[OSM-dev] OSM binary format (pbf) 1.0 is in osmosis trunk.

Wed Oct 13 00:53:12 BST 2010

On Thu, Sep 16, 2010 at 10:23 AM, Scott Crosby <scrosby at cs.rice.edu> wrote:

> > For now, for simplicity, I'm going to revert to the same metadata as
> > the XML format. Just a BBox and source field. I'll make them both
> > optional, making it easier to upgrade metadata features in the future.
> > When/if there is a consensus for additional metadata fields, support
> > for them can be added then.
> >
>
> > I'll be releasing an rc2 at some point.
>
> This has been done. RC2 is in osmosis trunk. Changes are almost
> exclusively to the underlying osmbin.jar with no format
> incompatibilities. Changes include:
>
>
Sorry its taken so long. Personal reasons have kept me away from this work.

I have committed the 1.0 version of the 'osmbin' jar to osmosis trunk. I
have also increased the maximum size of a header or fileblock to 64kb and
32mb respectively. (These limits are used to detect corrupt files.) I
believe I have also fixed the two reported bugs, Frederick's bug with
reporting the wrong error message and the negative UserID bug.

The only thing left is to rename 'osmbin' to 'osmpbf' to match the name of
the format, and put a copy of the source code into OSM's SVN server and to
find a good home for the jar. (Any suggestions?) For now the jar lives in
osmosis's SVN repository and the source is on github.

In osmosis, the important change is that the tasks have been renamed to
match the *.pbf file extension and are now --write-pbf and --read-pbf. I am
leaving behind the old task names --read-bin and --write-bin so that
existing scripts will work, but please fix your scripts. I also made one
small API change. The timestamps metadata field should have been an int64,
not an int32. This is not a format-compatability change, but it may require
minor changes to code using the protobuf definitions.

I am not sure when I will have time to update the wiki with the
documentation of the pbf tasks. For now, I am attaching a description of all
of the options.

Scott

///////////////
// --write-pbf

Arguments:

  file=<filename>  Currently '-' representing stdout is not supported.

  compress=deflate (default) Use deflate compression on each block
  compress=none Disable compression. About twice as fast to write and
                twice the size.

  batchlimit=8000  Block size used when compressing. This is a reasonable
default. Batchlimits that are too big may cause files to exceed the defined
filesize limits.

  granularity=100  The granularity or precision used to store coordinates.
The default of 100 nanodegrees is the highest precision used by OSM,
corresponding to about 1.1cm at the equator. In the current osmosis
implementation, the granularity must be a multiple of 100. If map data is
going to be exported to software that does not need the full precision,
increasing the granularity to 10000 nanodegrees can save about 10% of the
file size, while still having 1.1m precision.

 omitmetadata=false (default)
 omitmetadata=true  Omit non-geographic metadata on OSM entities. This
includes version number and timestamp of the last edit to the entity as well
as the user name and id of the last modifier. Omitting this metadata can
save 15% filesize when exporting to software that does not need this data.

 usedense=true (default) Nodes can be represented in a regular format or a
dense format. The dense format is about 30% smaller, but more complex. To
make it easier to interoperate with (future) software that chooses to not
implement the dense format, the dense format may be disabled.

// --read-pbf

Arguments:

   file=<filename>   Currently '-' representing stdin is not supported.

// Usage tips:

The default options for reading and writing are the safe options and
work efficiently and quickly.

Buffering can improve performance. The binary format processes data in
batches, entities are queued until a limit is reached, then that batch
is serialized and compressed. This serialization can run concurrently
with other osmosis processing. With more than one core, writing
throughput can be increased by about 60% by placing a buffer in the
processing pipeline just before writing. Similarily, a buffer placed
in the pipelilne immediatelly after parsing can likewise improve
read concurrency.

Eg:

osmosis --read-pbf file=XXX  --b bufferCapacity=12000 ....

   OR

osmosis .... --b bufferCapacity=12000 --write-pbf file=XXX ...

When generating data for export to other applications, I suggest
considerring --omitmetadata and --granularity=10000. Each option
reduces the size by about 1gb. With both options, a full planet (in
2010), including all nodes, ways, and tags, fits in 5.5gb.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20101012/c57d8833/attachment.html>