[OSM-dev] OSM binary format (pbf) 1.0 is in osmosis trunk.

Sat Oct 16 07:39:14 BST 2010

I've released version 0.37 of Osmosis with this included.  I haven't had a
chance to update the wiki either.  I'll try to do so over the next few days
if nobody beats me to it.

If anybody sees any issues with the binary support, please let Scott and I
know.  I'm now building Osmosis via an automated Hudson process so pushing
out new builds should be fairly quick once a fix is identified.

On Wed, Oct 13, 2010 at 10:53 AM, Scott Crosby <scrosby at cs.rice.edu> wrote:

> On Thu, Sep 16, 2010 at 10:23 AM, Scott Crosby <scrosby at cs.rice.edu>wrote:
>
>> > For now, for simplicity, I'm going to revert to the same metadata as
>> > the XML format. Just a BBox and source field. I'll make them both
>> > optional, making it easier to upgrade metadata features in the future.
>> > When/if there is a consensus for additional metadata fields, support
>> > for them can be added then.
>> >
>>
>> > I'll be releasing an rc2 at some point.
>>
>> This has been done. RC2 is in osmosis trunk. Changes are almost
>> exclusively to the underlying osmbin.jar with no format
>> incompatibilities. Changes include:
>>
>>
> Sorry its taken so long. Personal reasons have kept me away from this work.
>
> I have committed the 1.0 version of the 'osmbin' jar to osmosis trunk. I
> have also increased the maximum size of a header or fileblock to 64kb and
> 32mb respectively. (These limits are used to detect corrupt files.) I
> believe I have also fixed the two reported bugs, Frederick's bug with
> reporting the wrong error message and the negative UserID bug.
>
> The only thing left is to rename 'osmbin' to 'osmpbf' to match the name of
> the format, and put a copy of the source code into OSM's SVN server and to
> find a good home for the jar. (Any suggestions?) For now the jar lives in
> osmosis's SVN repository and the source is on github.
>
> In osmosis, the important change is that the tasks have been renamed to
> match the *.pbf file extension and are now --write-pbf and --read-pbf. I am
> leaving behind the old task names --read-bin and --write-bin so that
> existing scripts will work, but please fix your scripts. I also made one
> small API change. The timestamps metadata field should have been an int64,
> not an int32. This is not a format-compatability change, but it may require
> minor changes to code using the protobuf definitions.
>
> I am not sure when I will have time to update the wiki with the
> documentation of the pbf tasks. For now, I am attaching a description of all
> of the options.
>
> Scott
>
> ///////////////
> // --write-pbf
>
> Arguments:
>
>   file=<filename>  Currently '-' representing stdout is not supported.
>
>   compress=deflate (default) Use deflate compression on each block
>   compress=none Disable compression. About twice as fast to write and
>                 twice the size.
>
>   batchlimit=8000  Block size used when compressing. This is a reasonable
> default. Batchlimits that are too big may cause files to exceed the defined
> filesize limits.
>
>   granularity=100  The granularity or precision used to store coordinates.
> The default of 100 nanodegrees is the highest precision used by OSM,
> corresponding to about 1.1cm at the equator. In the current osmosis
> implementation, the granularity must be a multiple of 100. If map data is
> going to be exported to software that does not need the full precision,
> increasing the granularity to 10000 nanodegrees can save about 10% of the
> file size, while still having 1.1m precision.
>
>  omitmetadata=false (default)
>  omitmetadata=true  Omit non-geographic metadata on OSM entities. This
> includes version number and timestamp of the last edit to the entity as well
> as the user name and id of the last modifier. Omitting this metadata can
> save 15% filesize when exporting to software that does not need this data.
>
>  usedense=true (default) Nodes can be represented in a regular format or a
> dense format. The dense format is about 30% smaller, but more complex. To
> make it easier to interoperate with (future) software that chooses to not
> implement the dense format, the dense format may be disabled.
>
> // --read-pbf
>
>
> Arguments:
>
>    file=<filename>   Currently '-' representing stdin is not supported.
>
> // Usage tips:
>
> The default options for reading and writing are the safe options and
> work efficiently and quickly.
>
> Buffering can improve performance. The binary format processes data in
> batches, entities are queued until a limit is reached, then that batch
> is serialized and compressed. This serialization can run concurrently
> with other osmosis processing. With more than one core, writing
> throughput can be increased by about 60% by placing a buffer in the
> processing pipeline just before writing. Similarily, a buffer placed
> in the pipelilne immediatelly after parsing can likewise improve
> read concurrency.
>
> Eg:
>
> osmosis --read-pbf file=XXX  --b bufferCapacity=12000 ....
>
>    OR
>
> osmosis .... --b bufferCapacity=12000 --write-pbf file=XXX ...
>
>
> When generating data for export to other applications, I suggest
> considerring --omitmetadata and --granularity=10000. Each option
> reduces the size by about 1gb. With both options, a full planet (in
> 2010), including all nodes, ways, and tags, fits in 5.5gb.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20101016/3ba616cb/attachment-0001.html>