[OSM-dev] Daily Diff Merging

Brett Henderson brett at bretth.com
Sat Oct 20 06:38:36 BST 2007


I've timed applying the latest daily changeset to produce a new daily 
snapshot.

time osmosis --read-xml file=snapshot-20071019.osm --read-xml-change 
file=2007-10-19--10-20-00\:01\:02.osc --apply-change --write-xml 
file=snapshot-20071020.osm

real    63m7.739s
user    46m45.776s
sys     8m14.367s

Not sure if I can improve this speed much further, the merge process 
involves several threads communicating via shared buffers which adds 
synchronisation overhead.  Just over an hour to produce a new planet 
isn't too bad though.  Of course this doesn't include compression which 
is the real performance bottleneck.

Unless further problems are discovered, this is ready for use by anybody 
who wishes to obtain daily up to date and consistent snapshots of osm data.

Osmosis also has the capability to put these changesets directly into a 
MySQL database avoiding the need to create new planet files, I haven't 
tested the performance of that with recent data though.

Several people have already started to play with this, but if anybody 
has any questions or issues, please shout.

Documentation is available at:
http://wiki.openstreetmap.org/index.php/Osmosis
Any documentation improvements or additional examples are welcome.

Some possible ways for taking this further are:
* Create scripts to automate downloading daily changesets and patching 
local planet files or databases.
* Integrate into mapnik infrastructure to provide more up to date maps.
* Integrate into tiles at home infrastructure to improve scalability of 
rendering.
* Decrease time interval to provide more up to date data.
* Update planet creation process to include user information to allow 
more complete database replicas to be created.

All thoughts welcome.

Cheers,
Brett

Brett Henderson wrote:
> The latest osmosis v0.20 appears to be working correctly.
>
> I patched to the latest planet-071017.osm which is 17,917,842,715 
> bytes using planetdiff.
> I then downloaded the last three daily diffs.
> 2007-10-16--10-17-00:01:01.osc.bz2
> 2007-10-17--10-18-00:01:02.osc.bz2
> 2007-10-18--10-19-00:01:02.osc.bz2
>
> I applied the last three daily diffs as follows:
> osmosis --read-xml file=planet-071017.osm --read-xml-change 
> file=2007-10-16--10-17-00\:01\:01.osc.bz2 compressionMethod=bzip2 
> --apply-change --read-xml-change 
> file=2007-10-17--10-18-00\:01\:02.osc.bz2 compressionMethod=bzip2 
> --apply-change --read-xml-change 
> file=2007-10-18--10-19-00\:01\:02.osc.bz2 compressionMethod=bzip2 
> --apply-change --write-xml file=snapshot-20071019.osm
>
> I now have a snapshot-20071019.osm file that is 18,406,415,738 bytes.  
> The dataset is growing fast!




More information about the dev mailing list