Hi Markus,<br><div class="gmail_extra"><br><div class="gmail_quote">On 24 November 2012 00:04, <span dir="ltr"><<a href="mailto:marqqs@gmx.eu" target="_blank">marqqs@gmx.eu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Brett,<br>
<div class="im"><br>
> *If* this information is intended to be used as an input into replication<br>
> processes then the sequence number is essential. Osmosis writes a<br>
> timestamp in the state.txt file, but it only for identifying the right<br>
> sequence number to begin replication with. All replication processing<br>
> requires the sequence number. Attempting to use a timestamp is<br>
> theoretically possible but it's much less efficient and not how it was<br>
> supposed to work.<br>
<br>
</div>I think this is true for database based updates, however the sequence number is not really needed for file based updates we're presently talking about:<br>
<br>
For example, osmupdate downloads all change files, starting with the newest, going back in time until the the change file has been downloaded which is newer than the planet files timestamp. Then all these change files are merged to one big change file which is then applied to the planet file.<br>
</blockquote><div><br>Yep, that will work for patching planet files. The replication tasks in Osmosis can't operate that way though.<br><br>The existing --read-replication-interval allows limits to be specified to restrict the amount of changesets downloaded at a time. This allows a local database to catch up in smaller steps if it is a long way behind. Catching up in smaller steps is preferable in this case because it deals better with the odd failure in processing (it's very frustrating to download weeks of changes only to fail near the end and have to start again), and because it prevents transaction sizes from growing unbounded. Having to wait several days for one huge catchup transaction to be processed is far from ideal, it's preferable to catch up in smaller steps.<br>
<br>For patching planet files it's less of an issue because you'll almost always want all available changes to be applied, and because the number of files being downloaded will be much less (you'll typically be using daily or hourly files, not minute files) therefore you'll be less likely to run into an intermittent network connectivity problem, and patching a file is extremely unlikely to throw errors unless you run out of disk space or have a system crash.<br>
<br>One other thing worth mentioning is that timestamps are not guaranteed to increase for every change file. In practice for anything down to minute files you're unlikely to see any issues, but if the database server clock skews for any reason there's nothing to prevent time running backwards. This could lead to consumers relying on timestamps to miss data. Sequence numbers on the other hand are guaranteed to always increase per change file.<br>
<br>This is all a bit academic for patching planet files, but Osmosis doesn't make any assumptions about how short the changeset intervals are, or what is consuming changes at the other end of the pipeline.<br><br>I could create a new task optimised for patching planet files, and perhaps that's what I (or somebody else if they wish to step in) will need to do if we embed replication information into PBF files, but it will have to remain separate from --read-replication-interval, so there'll be more code to maintain. I'm not opposed to it if it makes users lives easier though.<br>
<br>In summary, I'd prefer to keep using sequence numbers if possible because it allows me to re-use more existing replication code, but it wouldn't be impossible to do without them.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Osmosis may work differently, and it may need the sequence number to start this kind of file update - I really don't know. But if so, I totally agree, we should make it possible to store sequence numbers in PBF files.<br>
<br>
Could also be done with the key-val format I suggested...<br></blockquote><div><br>Cool, I don't have any strong opinions on how the information should be stored. I'm happy to leave that in the hands of those more familiar with the PBF format.<br>
<br>Brett<br></div></div><br></div>