[osmosis-dev] Question regarding the replication file structure
Brett Henderson
brett at bretth.com
Mon Jan 28 03:55:07 GMT 2013
Hi Frederik,
On 24 January 2013 09:02, Frederik Ramm <frederik at remote.org> wrote:
> Hi,
>
> I'm toying with the idea of offering regionalised diffs - i.e. a series
> of daily diffs for every regional extract that download.geofabrik.de has
> to offer. To make it easy for consumers to keep their extracts up to date,
> I thought about making an Osmosis-style directory for each extract, e.g.
> something like
>
> download.geofabrik.de/**openstreetmap/europe/germany/**
> nordrhein-westfalen/000/000/**001.osc.gz<http://download.geofabrik.de/openstreetmap/europe/germany/nordrhein-westfalen/000/000/001.osc.gz>
>
> or so. Just to be safe: What are the conventions that I will have to
> follow so that this works seamlessly with existing clients? Simply have a
> xxx.osc.gz and matching xxx.state.txt in the leaf directory, count from 000
> to 999 then wrap to the next directory, and have the most recent state.txt
> file at the root directory as well - anything else?
>
That sounds about right. Each state file should only need the
sequenceNumber and timestamp fields (existing hour and day replication
files only provide these two fields). The sequenceNumber is the most
important (and easiest) to get right. The timestamp should be greater than
or equal to the timestamp of the latest entity in the change file, but this
is only critical when identifying a replication start point.
The order in which you write files is important to avoid race conditions
and cope with software failures. I always write the osc.gz file, then the
state.txt file in the leaf directory, then finally the state file in the
root directory. If I encounter any failures during processing and the root
state.txt file isn't created, I simply start again and overwrite any
existing osc.gz and state.txt files in the leaf directory. The state.txt
file in the root directory is used to identify the current sequence number
at the start of processing. It is critical to ensure that only a single
process writes to the replication directory at a time.
>
> If the frequency wasn't exactly daily - if, say, because of some sort of
> glitch there was extract for one day and therefore the diff is missing, or
> if there were two extracts in one day - would that matter?
>
So long as the sequence number always increases by one you should be fine.
Osmosis (not sure about other clients) bases most of its processing off the
sequence number and doesn't care how far apart each time interval is. For
example, the existing minute replication sometimes has much larger than 1
minute gaps if the replication process is halted for any reason.
Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/osmosis-dev/attachments/20130128/ad781acb/attachment.html>
More information about the osmosis-dev
mailing list