[OSM-dev] Release candidate for OSM binary format is in osmosis trunk.

Scott Crosby scrosby at cs.rice.edu
Sun Sep 5 21:58:25 BST 2010


On Sun, Sep 5, 2010 at 11:42 AM, Frederik Ramm <frederik at remote.org> wrote:
> Scott,
>
> Scott Crosby wrote:
>>
>> message HeaderBlock {
>>  required HeaderBBox bbox = 1;
>>
>>  // Author, name, and version number of the dataset in this file. (to
>> permit
>>  // patches/updates to be incrementally applied)
>>  optional string datasetauthor = 16; // TODO: WANT THIS?
>>  optional string datasetname = 17;  // TODO: WANT THIS?
>>  optional int64 version = 18; // TODO: WANT THIS?
>>
>>  // Program generating this data
>>  optional string writingprogram = 19;  // TODO: WANT THIS?
>> }
>
> To start regular updates after importing a full planet file, one typicalle
> needs to find out which state.txt file on planet.openstreetmap.org to copy.

> The current alogrithm for this is:
>
> * decide whether you want daily, hourly, or minutely updates;
> * find out the latest timestamp in your data set, or alternatively use the
> time of dataset creation
> * find the latest state.txt file from the appropriate directory that was
> created before your own latest timestamp
> * copy that to your Osmosis working directory
>
> In order to make this really easy, a data file should (in order of
> preference) either

I can define fields or sub-messages in the header to store any of this
information. A representation of the state file, URLs, timestamps, or
all three.

However, your preferred suggestion is to contain the information to
synthesize a state.txt file for updates. I can include this in the
header. I looked at some existing state files and hers is what I'm
guessing the schema is. Is it correct?

message OneReplicationStateV1 {
   required string base_url = 1;
   required int64 sequence_number = 2;
   required int64 timestamp = 3; // Milliseconds since 1970
   required int64 txn_max = 4;
   required int64 txn_max_queried = 5;
   repeated int64 txn_ready_list = 6;
   repeated int64 txn_active_list = 7;
}

message ReplicationStateV1 {
   optional OneReplicationStateV1 minute = 16;
   optional OneReplicationStateV1 hour = 17;
   optional OneReplicationStateV1 day = 18;
   optional OneReplicationStateV1 week = 19;
}

One question as a sanity-check. How many items will txn_active_list
and txn_ready_list typically have in the average case, worst case
(worst 1%), and absolutely worst case (worst .0000001%)?

There's one other issue with including replication information in the header.

Making that information usable is a separate challenge as that state
information has to be pushed through the osmosis pipe both to and from
the format. I'd have to ask Brett to chime on as to how to do this, my
guess is as part of a Bounds object. I'm assuming that osmosis is used
to generate these dumps, if not, then the program generating the dump
has to be modified to generate a binary format.

Scott



More information about the dev mailing list