[OSM-dev] Working with OSM data with less or no metadata

Michael Reichert michael.reichert at geofabrik.de
Wed Feb 14 09:30:18 UTC 2018


Hi,

people are talking about potential changes to the amount of (personal)
data distributed by OSM, in the light of new data protection laws
becoming effective in the EU this May. There haven't been any official
statements by the OSMF but discussions are going on in the LWG [1].

Even though it is still unclear what the concrete steps will be, I have
done some experiments. How well do our existing tools behave if you feed
them with OSM data that has less metadata than usual, or no metadata at
all? I have set up a test suite which tests Osmium-Tool (which uses the
Libosmium library; master branch), Osmosis 0.44.1 and Osmconvert 0.6.

The test suite is availabe at
https://github.com/geofabrik/metadata-test/
and consists of a Bash script. You need to have osmium, osmosis and
osmconvert in your path (or you have to modify the script a bit). The
test suite comes with its own hand crafted test data which will be first
converted to PBF by Osmium. Afterwards all three tools will prove
themselves in the following challenges:

- converting XML to PBF
- converting PBF to XML
- converting XML to XML
- applying a diff
- deriving changes between two OSM files

All challenges are run four times, one iteration with full metadata, one
with timestamp and version fields, one with version field only and one
without any metadata. Some PBF challenges will also have two variants –
one with DenseNodes and one without.

The results are files located in the output/ directory. You have to
inspect them manually, I have not written a tool to parse them and
output how many tests failed.

*Results*
I compiled the results into a spreadsheet. You can download it at
https://github.com/geofabrik/metadata-test/raw/master/table.ods

To sum them up:
- Osmium is the only programme which passes all format conversion tests.

- Osmosis cannot read any XML (OSM and OSC) files without timestamp and
version fields.

- Osmosis and Osmconvert [2] treat all metadata fields in the DenseInfo
message of the PBF format as mandatory. However, the format
specification doesn't declare these fields as mandatory. Therefore, they
write default values into PBF files if the input lacks these fields:
version="-1" timestamp="1969-12-31T23:59:59Z" changeset="-1" (Osmosis [3]),
timestamp="1970-01-01T00:00:01Z" changeset="1" version="1" (Osmconvert)
This partially applies to the XML output of Osmosis, too.

- Deriving a diff file of the changes between two OSM files only works
if both files have the same amount of metadata. If one file contains
less or more metadata, all objects will appear in the diff file with
their new metadata and bloat it up. The question is whether this is the
desired behaviour (i.e. the ability to clean a file from metadata using
large diffs) or if this behaviour is not desired and the tools
generating diffs should compare the tags, location and members of
objects which have the same ID but different metadata.

- Some tools have bugs which lead to wrong diffs (e.g. missing
modifications) if some metadata fields are missing.

Best regards

Michael


[1]
https://wiki.osmfoundation.org/wiki/Working_Group_Minutes#Licensing_Working_Group
[2] Osmium also had this bug. But it was fixed on the master branch a
few days ago.
[3] Osmium cannot parse negative version numbers and throws an exception.


-- 
Michael Reichert      www.geofabrik.de
Geofabrik GmbH        Handelsregister: HRB Mannheim 703657
Amalienstr. 44        Geschaeftsfuehrung: C. Karch, F. Ramm
76133 Karlsruhe       Tel: 0721-1803560-3
reichert at geofabrik.de Fax: 0721-1803560-9

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20180214/33ee3620/attachment.sig>


More information about the dev mailing list