[OSM-dev] Proposed schema for the OSM binary format.

jamesmikedupont at googlemail.com jamesmikedupont at googlemail.com
Tue Aug 10 06:33:26 BST 2010


hi scott,
what has changed since the beginnning?
the only thing that I would like to see is some form of tree structure
being possible, we should consider that the quadtree can be
implemented using the protobuf for an optimal random access.
all the best,
mike

On Mon, Aug 9, 2010 at 8:06 PM, Scott Crosby <scrosby at cs.rice.edu> wrote:
> As I'm in the process of packaging the osm binary format for packaging
> as part of osmosis, I expect that it may start get used fairly soon.
>
> Like any fileformat, the binary format requires a schema which will be
> hard to change. Before finalizing it, I would like to run my schema
> design past stakeholders for feedback and suggestions before people
> start using it.
>
> Once the schema is set and data is in that format, changing the schema
> becomes MUCH more difficult.
>
> My design includes several features that we may want to strip out,
> such as flags indicating if the dataset is already pre-sorted, etc.
> The schema isn't the implementation; not every feature needs to be
> fully supported or used by the current toolset, but I would like to be
> forward thinking and consider future use cases.
>
> For instance, one feature that I thought was worth including was an
> offset field so that the binary format can represent isohypsis data
> efficiently, setting its grid spacing to be the same as that of the
> isohypsis data, and then adjust the offset to make the grids align.
> For forward compatibility with future files that include an offset, it
> must be, and is, implemented by the parser, even if it is not
> supported by the serializer. Another feature was dataset version
> numbers (suggested by Fredrick Ramm) for auto-applying delta's. What
> other features should be considered and added?
>
> I am placing my current protobuf source file here, please reply with
> any questions/comments or suggestions.
>
> Thank you,
> Scott
>
> ////////////////////////////////////////
>
>> /* OSM Binary file format
>>
>> This is the master schema file of the OSM binary file format. This
>> file is designed to support limited random-access and future
>> extendability.
>>
>> A binary OSM file consists of a sequence of FileBlocks (please see
>> fileformat.proto). The first fileblock contains a serialized instance
>> of HeaderBlock, followed by a sequence of PrimitiveBlock blocks that
>> contain the primitives.
>>
>> Each primitiveblock is designed to be independently parsable. It
>> contains a string table storing all strings in that block (keys and
>> values in tags, roles in relations, usernames, etc.) as well as
>> metadata containing the precision of coordinates or timestamps in that
>> block.
>>
>> A primitiveblock contains a sequence of primitive groups, each
>> containing primitives of the same type (nodes, densenodes, ways,
>> relations). Coordinates are stored in signed 64-bit integers. Lat&lon
>> are measured in units <granularity> nanodegrees. The default of
>> granularity of 100 nanodegrees corresponds to about 1cm on the ground,
>> and a full lat or lon fits into 32 bits.
>>
>> Converting an integer to a lattitude or longitude uses the formula:
>> $OUT = IN * granularity / 10**9$. Many encoding schemes use delta
>> coding when representing nodes and relations.
>>
>> */
>>
>> //////////////////////////////////////////////////////////////////////////
>> //////////////////////////////////////////////////////////////////////////
>>
>> /* Contains the file header. */
>>
>> message HeaderBlock {
>>   required HeaderBBox bbox = 1;
>>
>>   // Author, name, and version number of the dataset in this file. (to permit
>>   // patches/updates to be incrementally applied)
>>   optional string datasetauthor = 16; // TODO: WANT THIS?
>>   optional string datasetname = 17;  // TODO: WANT THIS?
>>   optional int64 version = 18; // TODO: WANT THIS?
>>
>>   // Program generating this data
>>   optional string writingprogram = 19;  // TODO: WANT THIS?
>>
>>   /* Additional tags to aid in parsing this dataset */
>>   repeated string required_features = 4; // TODO: WANT THIS?
>>   repeated string optional_features = 5; // TODO: WANT THIS?
>> }
>>
>> /*
>>
>> Required features are features that an implementation must
>> understand in order to be able to parse the OSM entities in the
>> file. If a program sees a required feature that it does not
>> understand, it must error out. Currently the following features are
>> defined:
>>
>>   "DenseNodes" -- File uses dense nodes.
>>
>>
>> Optional features are features that a file has that a program may
>> exploit if it chooses to.
>>
>>   "Has_Metadata" -- Does the file contain author and timestamp metadata?
>>   "Sort.Type_then_ID" -- Entites are sorted by type then ID.
>>   "Sort.Geographic" -- Entities are in some form of geometric sort.
>>
>> */
>>
>> /** The bounding box field in the OSM header. BBOX, as used in the OSM
>> header. Units are in nanodegrees. */
>>
>> message HeaderBBox {
>>    required sint64 left = 1;
>>    required sint64 right = 2;
>>    required sint64 top = 3;
>>    required sint64 bottom = 4;
>> }
>>
>>
>> ///////////////////////////////////////////////////////////////////////
>> ///////////////////////////////////////////////////////////////////////
>>
>>
>> message PrimitiveBlock {
>>   required StringTable stringtable = 1;
>>   repeated PrimitiveGroup primitivegroup = 2;
>>
>>   // Granularity, units of nanodegrees, used to store coordinates in this block
>>   optional int32 granularity = 17 [default=100];
>>   // Offset value between the output coordinates coordinates and the granularity grid in unites of nanodegrees.
>>   optional int64 lat_offset = 19 [default=0];
>>   optional int64 lon_offset = 20 [default=0];
>>
>> // Granularity of dates, normally represented in units of milliseconds since the 1970 epoch.
>>   optional int32 date_granularity = 18 [default=1000];
>>
>>
>>   // Optional extensions, also included in index data.
>>   //optional BBox bbox = 19;  // TODO: WANT THIS?
>> }
>>
>> // Group of OSMPrimitives. All primitives in a group must be the same type.
>> message PrimitiveGroup {
>>   optional DenseNodes dense = 2;
>>   repeated Way      ways = 3;
>>   repeated Relation relations = 4;
>>   repeated ChangeSet changesets = 5;
>> }
>>
>>
>> /** String table, contains the common strings in each block.
>>
>>  Note that we reserve index '0' as a delimiter, so this entry in the table
>>  is ALWAYS blank and unused.
>>
>>  */
>> message StringTable {
>>    repeated bytes s = 1;
>> }
>>
>> /* Optional metadata that may be included into each primitive. */
>> message Info {
>>    optional int32 version = 1 [default = -1];
>>    optional int32 timestamp = 2;
>>    optional int64 changeset = 3;
>>    optional int32 uid = 4;
>>    optional int32 user_sid = 5;
>> }
>>
>> // TODO: REMOVE THIS? NOT in osmosis schema.
>> message ChangeSet {
>>    required int64 id = 1;
>>    // Parallel arrays.
>>    repeated uint32 keys = 2 [packed = true]; // String IDs.
>>    repeated uint32 vals = 3 [packed = true]; // String IDs.
>>
>>    optional Info info = 4;
>>
>>    required int64 created_at = 8;
>>    optional int64 closetime_delta = 9;
>>    required bool open = 10;
>>    optional HeaderBBox bbox = 11;
>> }
>>
>> /* Used to densly represent a sequence of nodes that do not have any tags.
>>
>> We represent these nodes columnwise as five columns: ID's, lats, and
>> lons, all delta coded. And an optional info's column when metadata is not omitted.
>>
>> We encode keys & vals for all nodes too in one array in the form of ONE array of integers containing key-id and val-id values, using 0 as a delimiter between nodes.
>>
>>    ( (<keyid> <valid>)* '0' )*
>>  */
>> message DenseNodes {
>>    repeated sint64 id = 1 [packed = true]; // DELTA coded
>>
>>    repeated Info info = 4;
>>
>>    repeated sint64 lat = 8 [packed = true]; // DELTA coded
>>    repeated sint64 lon = 9 [packed = true]; // DELTA coded
>>
>>    // Special packing of keys and vals into one array.
>>    repeated int32 keys_vals = 10 [packed = true];
>> }
>>
>>
>> message Way {
>>    required int64 id = 1;
>>    // Parallel arrays.
>>    repeated uint32 keys = 2 [packed = true];
>>    repeated uint32 vals = 3 [packed = true];
>>
>>    optional Info info = 4;
>>
>>    repeated sint64 refs = 8 [packed = true];  // DELTA encoded
>> }
>>
>> message Relation {
>>   enum MemberType {
>>     NODE = 0;
>>     WAY = 1;
>>     RELATION = 2;
>>   }
>>    required int64 id = 1;
>>    // Parallel arrays.
>>    repeated uint32 keys = 2 [packed = true];
>>    repeated uint32 vals = 3 [packed = true];
>>
>>    optional Info info = 4;
>>
>>    // Parallel arrays
>>    repeated int32 roles_sid = 8 [packed = true];
>>    repeated sint64 memids = 9 [packed = true]; // DELTA encoded
>>    repeated MemberType types = 10 [packed = true];
>> }
>>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>



-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova and Albania
flossk.org flossal.org



More information about the dev mailing list