[OSM-talk] Keeping imported data updated with source changes

Wiktor Niesiobedzki osm at vink.pl
Sat Jan 10 09:59:33 UTC 2015


Hi,

In Poland we have quite a few addresses imported from government
sources for quite long time, but as time goes on, changes are made to
the source databases, and local communities don't have any viable
tools, to track, what has changed in source. In case of city of
Skarżysko-Kamienna, local mapper tried hard to track all the changes
in source (as well as check this on site), but still, missed a lot of
changes, and as it's now - there is no tooling to help such users.

What I'd like to do, is to prepare a service, that will generate
changes for OSM containing differences for each municipality, so local
mapper can load, review and decide what to import.

But this tool, to be efficient, needs additional information to be
stored in OSM - identifier of the object in the source database, for
which i propose tag: ref:addr.

This tag is used for both identifying what was already imported, as
well as, I'd like to create a protocol, that if there are some "wrong"
data in the import source, we would leave a point in OSM containing:
addr:ref
source:addr

So we can instruct further imports, to skip this point, unless there
will be some change in source data.

I find this solution most robust, as it gives great Signal-to-Noise
ratio for local mappers, when they are identifying what needs to be
updated, as well as, gives as resilience when someone accidentally
deletes some address.

In Poland there thousands of people employed by government to keep
this data in good quality and using OSM community to duplicate their
work is in my opinion - wasteful. Using this method, we can use their
work, and use OSM community to improve the data, that government is
sourcing. And this is something we should consider for all of the
imports.

We had some discussion about this already in Polish community, but as
it seems, it might be philosophical change for this project, I'd like
to raise this issue on global level.

Apart from addresses I plan to start importing national heritage
objects, for which I see exactly the same problem.

The other solution that we discussed in our community is to keep track
of import source state in separate database, and use this, to see what
has changed in source, to generate files for local mappers, but I see
following disadvantages of such solution:
- such solution doesn't take into account current state of objects in
OSM, what may generate duplicates or miss data, that were accidentally
deleted
- it makes harder to fork OSM project, as you need to fork two
databases, know about them, and the license for such database should
be open
- it still needs some "protocol" to this database, to mark that import
was done (and in what extent) - it would require additional tooling
and might be additional problem to causual mappers, and probably would
render the tool unusable
- it gives no tools for integrity with OSM databases
- needs additional support


The disadvantages of my solution, that I found most concerning were:
- nodes contaning only ref:addr and source:addr might be hard to
understand by newcomers, especially that ref:addr doesn't contain any
human-understandable data
- ref:addr might get clobbered during merge of nodes

But I hope that with extensive description on Wiki we can handle that problems.

Cheers,

Wiktor Niesiobędzki



More information about the talk mailing list