[OSM-dev] release of full-history extracts

Peter Körner osm-lists at mazdermind.de
Thu May 12 09:46:53 BST 2011


Hi

I'm very proud to announce the release of the first history extracts. 
They have been created from the latest full-experimental-dump [1] using 
my history splitter [2], based on a slightly modified version of Jochen 
Topfs really great osmium framework [3]. They contain multiple versions 
of an object. If you just want the map-data as it is currently, use the 
Geofabrik-Extracts [4].

The extracts can be downloaded from gwdg:
 
<ftp://ftp5.gwdg.de/pub/misc/openstreetmap/osm-full-history-extracts/110418/hardcut-bbox-xml/>


Their size ranges from very small (a village) via medium (Berlin) to 
large (Germany). They only cover a very, very small part of the world 
and are currently targeted at application developers that are looking 
for data to test their history analysis apps. The extracts are in the 
osm-xml format with the visible-flag included and are partially 
bzip2-compressed. Because they're are no normal .osm files, they carry 
the file extension .osh. Some common programs like JOSM can open them 
them when you rename them to .osm, but the produces output is not very 
useful in most cases.

They are cutted using simple bounding-boxes [5] and the 
hardcut-algorithm [6]. Dumps created using that algorithm have the 
following characteristics:
  - ways are cropped at bbox boundaries
  - relations contain only members that exist in the extract
  - ways and relations are reference-complete
  - relations referring to relations that come later in the file are 
missing this references
  - ways that have only one node inside the bbox are missing from the output
  - only versions of an object that are inside the bboxes are in thr 
extract, some versions of an object may be missing


Generating those extracts took 18 hours on a recent 4-core-4-gig home 
computer. The 25 GB compressed dump expands to 451 GB uncompressed xml, 
which contains 1761267506 node-versions, 171118257 way-versions and 
4461725 relation-versions. The long processing time resulted primarily 
from the blocking communication between the splitter and the 
bzip2-compressors. Uncompressing and parsing the full-experimental-dump 
took 8 hours while writing to an uncompressed xml-file and 18 hours when 
writing to a bz2-compressed file, so bzip2 is the major issue here. The 
next logical step will be to define a pbf-format for history files. Once 
it's there I'll look forward to release a pbf-version of the 
full-experimental-dump as well as more extracts.

Peter



[1] 
<http://planet.osm.org/full-experimental/full-planet-110418-0000.osm.bz2>
[2] 
<https://github.com/MaZderMind/OpenStreetMap-History-API/tree/0.1/splitter>
[3] <https://github.com/MaZderMind/osmium/tree/splitter-1.0>
[4] <http://download.geofabrik.de/>
[5] 
<ftp://ftp5.gwdg.de/pub/misc/openstreetmap/osm-full-history-extracts/bbox.config>
[6] 
<https://github.com/MaZderMind/OpenStreetMap-History-API/blob/0.1/splitter/hardcut.hpp>



More information about the dev mailing list