[Imports] Import of Flemish Government data (building footprints and addresses)

Mon Oct 29 21:27:33 UTC 2018

On Mon, Oct 29, 2018 at 2:55 PM Christoph Hormann <osm at imagico.de> wrote:

> For this purpose it is completely unnecessary to bother the OSM
> community with external IDs.  If you want to check if the data has been
> unchanged since you added it then do exactly that - check if there are
> any newer versions of the objects that have originally been added in
> the import.
>

> I don't really understand why this discussion is coming up again and
> again with imports here.  To me this very much looks like a 'lazy
> programmer' attitude - having an ID available probably seems the most
> conventient way to identify the object.  But think about this for a
> moment - you want to bother the OSM community with the burden of
> dealing with these IDs forever just to make it a tiny bit easier for a
> programmer to possibly in the future write code to identify what
> features from the import have not been changed since they were added.
>

The programming to which I refer is not some speculation about what I
might do 'possibly in the future,' it is the process that I'm using
right now in the present. I've done multiple updates to imported data
using exactly the technique that I describe.

You're right that I'm lazy - to the extent that I'm not willing to repeat
all of
the work of the initial import - which took me several months, off and on,
every time that New York posts an update to the data set. I simply am
not going to be able to keep up with the workload unless I have tools to
recognize what has changed and speed the process of updating only
what has changed. In a way, that's an assertion that all programmers
are lazy, since any process that can be programmed can hypothetically
be done without automation!

Still, leading off by calling me a lazy programmer doesn't predispose
me to treating the rest of your arguments with the respect that they
no doubt deserve.

> Note i completely get that many programmers are unfamiliar with the OSM
> concepts of changesets, object versions etc. and for them it would
> indeed be more convenient not to have to deal with that and instead
> retrofit OSM to what they are used to.  But that is not something OSM
> should accept.
>

Very well, in addition to being lazy, I'm also ignorant. Presumably
you with your superior wisdom and industry can enlighten me.
Can you detail what your ideal process is for this use case:

I have an object in an external shapefile, that has a stable external ID.
I have likely imported that object before. I may well have altered it from
the shapefile in order to merge changes with another mapper's view
of the world, to simplify ways, to correct topology, or otherwise to make
the object conformant with OSM practices.

I wish to (a) locate the OSM object that corresponds with the object
in the shapefile, and (b) determine whether it has changed since I saw
it before. Ideally I want to do both without having to maintain a persistent
external store other than the shapefile itself and the OSM database,
because I want someone else to be able to run my script if I am not
available, and don't want to have to get into the business of distributing
a third artifact that would give the external ID<->OSM ID mapping.

(b) is handled well with change sets, and in fact that's what I use.
I check whether the most recent change set is mine. (And in fact,
part of my workflow, in the event that the object has changed, involves
doing a three-way differential comparison between the version
that I last imported, the latest version, and the version in the
external data set.)

But how do I do (a), and not have to get into doing data distribution of
yet a third artifact? I can't do an exact match on geometry - I likely
tweaked it (simplifying ways, correcting topology, ...).  I can't do
a simple match on name - that too, may have been misspelt,
miscapitalized, contained abbreviations, ....  What is left that I can
match on, if I don't record some sort of stable key?

Recall that this is not a hypothetical argument. I have an existing
workflow that functions for me. You are insisting that I need to change
it to remove what amounts to a single tag on each imported object,
because that is too much computational and intellectual overhead
for the rest of OSM to deal with. I consider that the burden of proof
is on you to demonstrate a workable alternative.

If that alternative requires me to maintain a separate, external
database, also under version control, that I must distribute to
and synchronize with everyone who helps me with an update,
then, pray tell, why should I bother contributing the data to OSM
at all rather than just keeping it to myself and my friends?

Simply telling me that I'm ignorant and lazy doesn't address my use
case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20181029/b21f5509/attachment-0001.html>