[osmosis-dev] Announcement of the OSMbin file-format

Wed Feb 4 12:00:48 GMT 2009

Hello fellow devlopers,

I'd like to announce the availability of the final Version 1.0
of the OSMbin file-format and its reference-implementation.
http://wiki.openstreetmap.org/wiki/OSMbin(file_format)/version_1.0

   Why this file-format?

The XML-format of the OSM-protocoll has become a great
interchange-format. It is used by editors (JOSM), data-manipulation-tools
(Osmosis) and renderers all over the place.
However it is a data-interchange-protocoll.
Not an on-disk-format.

If you write a navigation-software, a moving-map -application
or another kind of application that requires to quickly look
up single entities in a large map or look up related entities
(like the nodes of a way) quickly then parsing and seeking
in an XML-file is no going to help you.
Up to now there where only 2 choices:
* require the user to install a large postgis or mysql+gis
or
* specify, implement, optimize and debug your own special
  file-format (Navit, OSM Binary Format, ...).
If the first is not an option for your user-base, who may
want to get a single program that they install and just run
there is now a third option.
An already specified file-format 

   What is it intended for?

OSMbin is ment to be a native on-disk-format
(as opposed to a transport-format)
for programs that 
* work with maps too large to fit into memory and 
* require fast access by element-id or
* require fast random access to map-elements or
* require navigation from nodes to their ways or
* from nodes, ways and relations to the relations they are a member of

Many special purpose file-format we have are
read-only. You need to download a completely
new map and convert it into this format.
OSMbin was designed with fixed size records
to be read-write.
You can have your users store a continent
on their navigation-system and apply the
weekly diffs to that without having them to
download more then just the weekly diffs.
(What end-user wants to download and convert
 a planet-file more then once?)

   Why not use XYZ?

If an application already has a native
format then this is probably already the best
there is for it and you should stick with it.

There is however a gread benefit in having
a file-format used by more then one application.
You can test that your implementation is working
correctly. You know that your test-data are
not incorrect. You get tools for repairing,
modifying and analysing your local map
without having to write them all alone.

   But I need XYZ...

OSMbin is specified to be able to store
anything OSM with api 0.6 can describe.
So there is nothing in OSM you cannot store
in it.
If you want to store less then everything,
you are free to filter out what you don't need.

If you do not like the indice and can think
of a better way to index the content of the
.obm -files, you can leave the .idx -files
out and store your own index-files with another
file.extention. (For example an hsqldb-datase
or a better balanced 2d-index.)

   What do I get?

You not only get the 
* compelte and final specification
http://wiki.openstreetmap.org/wiki/OSMbin(file_format)
http://wiki.openstreetmap.org/wiki/OSMbin(file_format)/version_1.0

but also a
* well commented and easy to understand Reference-implementation 
http://travelingsales.svn.sourceforge.net/viewvc/travelingsales/libosm/src/org/openstreetmap/osm/data/osmbin/
(If you do not understand something of find a bug, please
 inform me and I'll clarify the documention/fix the bug. You are not alone
here.)

and also an
* Osmosis-tasks for writing OsmBin and repairing the index of broken
osmbin-files
http://wiki.openstreetmap.org/wiki/Osmosis/DetailedUsage#Plugin_Tasks
(So you need not write your own map-importer first but concentrate on your
 application.)
In the near future you will also get
* osmosis-tasks to repair and analyse the file
(The specification contains rules to define how to repair broken references
 in the .obm-files and the .idx and .id2 can be re-generated completely
already.)

Thus you do not start at 0 but get a whole lot of tested infrastructure
to make debugging your own implementation and generating test-data easy.
There are also people you can simply ask because it's nothing you only
you know.

   What will change in the future?

The .idx and .id2 are not perfect.
I ment them to be easy to understand and easy to
implement without having to take a course in geographic database-design
and multidimensional index-structures first.
As a result however they are not guaranteed to be balanced.
So.. if you can specify and implement a better 1d or 2d -index, you
are welcome.

If there is interest, I may write a small graphical tool to take
a directory with OSMbin -data and let you display the content of
records, so you can easily analyse your own implementation.
If you are a java-developer, you can use the reference-implementation
right away by checking out LibOSM (a part of Traveling Salesman)
https://sourceforge.net/svn/?group_id=203597

Greetings.
Marcus