[OSM-dev] release of full-history extracts
Peter Körner
osm-lists at mazdermind.de
Thu May 12 09:46:53 BST 2011
Hi
I'm very proud to announce the release of the first history extracts.
They have been created from the latest full-experimental-dump [1] using
my history splitter [2], based on a slightly modified version of Jochen
Topfs really great osmium framework [3]. They contain multiple versions
of an object. If you just want the map-data as it is currently, use the
Geofabrik-Extracts [4].
The extracts can be downloaded from gwdg:
<ftp://ftp5.gwdg.de/pub/misc/openstreetmap/osm-full-history-extracts/110418/hardcut-bbox-xml/>
Their size ranges from very small (a village) via medium (Berlin) to
large (Germany). They only cover a very, very small part of the world
and are currently targeted at application developers that are looking
for data to test their history analysis apps. The extracts are in the
osm-xml format with the visible-flag included and are partially
bzip2-compressed. Because they're are no normal .osm files, they carry
the file extension .osh. Some common programs like JOSM can open them
them when you rename them to .osm, but the produces output is not very
useful in most cases.
They are cutted using simple bounding-boxes [5] and the
hardcut-algorithm [6]. Dumps created using that algorithm have the
following characteristics:
- ways are cropped at bbox boundaries
- relations contain only members that exist in the extract
- ways and relations are reference-complete
- relations referring to relations that come later in the file are
missing this references
- ways that have only one node inside the bbox are missing from the output
- only versions of an object that are inside the bboxes are in thr
extract, some versions of an object may be missing
Generating those extracts took 18 hours on a recent 4-core-4-gig home
computer. The 25 GB compressed dump expands to 451 GB uncompressed xml,
which contains 1761267506 node-versions, 171118257 way-versions and
4461725 relation-versions. The long processing time resulted primarily
from the blocking communication between the splitter and the
bzip2-compressors. Uncompressing and parsing the full-experimental-dump
took 8 hours while writing to an uncompressed xml-file and 18 hours when
writing to a bz2-compressed file, so bzip2 is the major issue here. The
next logical step will be to define a pbf-format for history files. Once
it's there I'll look forward to release a pbf-version of the
full-experimental-dump as well as more extracts.
Peter
[1]
<http://planet.osm.org/full-experimental/full-planet-110418-0000.osm.bz2>
[2]
<https://github.com/MaZderMind/OpenStreetMap-History-API/tree/0.1/splitter>
[3] <https://github.com/MaZderMind/osmium/tree/splitter-1.0>
[4] <http://download.geofabrik.de/>
[5]
<ftp://ftp5.gwdg.de/pub/misc/openstreetmap/osm-full-history-extracts/bbox.config>
[6]
<https://github.com/MaZderMind/OpenStreetMap-History-API/blob/0.1/splitter/hardcut.hpp>
More information about the dev
mailing list