[Talk-GB] OSM should not be a database dump.

James Derrick lists at jamesderrick.org
Sat Feb 11 10:20:17 UTC 2023


Hi,
On 10/02/2023 16:33, SK53 wrote:
> Adding lots of potentially out-of-date data to OSM tends to move the 
> project from being one of mapping things to one about maintaining a 
> somewhat
> out-of-date database.

+1

We need to think of the lifecycle "cost" of the data collected and 
stored and MAINTAINED by OSM.

It's not just about storing "other entities data" in OSM (database 
dump), it's about what happens for the next ten years, and the tooling 
required.


Using widely consumed data in a consistent manner is absolutely OSM's 
goal - so using external databases to validate and enhance our coverage 
is great.

Adding limited external "foreign key" reference IDs has some value as it 
can assist future maintenance checks - e.g. if a School changes name, or 
a take away becomes a house (the sort of checks Rob Whittaker's Survey 
Me! tool does well - to name but one).

The issue is when the effort required to maintain the OSM data exceeds 
the value to OSM consumers.


You can add a shop in two keys `shop=supermarket` `name=Iceland` - works 
fine.

These days, best practice is to add several other keys to external 
databases, and my opinion is it it getting out of hand:

`brand=Iceland`
`brand:wikidata=Q721810`
`brand:wikipedia=en:Iceland (supermarket)`

And that's without `ref= fhrs:id= fhrs:local_authority_id= branch= 
contact:website=`.

Suddenly, you are looking at maintaining nine duplicated keys where two 
work - a higher barrier to entry, and ongoing maintenance cost.


Okay, the extras are typically added later by armchair mappers (yes, I 
do both survey and armchair by season) but...

Iceland decides to change their brand strategy and we're into an 
automated update to Food Warehouse to update all the keys, but not all 
branches are moving, and some are closing so we're looking at a 
quarterly project, then a ghost hunt for old Maplin stores...


Don't get me wrong - there is value in performing these extra tasks (and 
I use the Chain Reaction tool for the extra references, and really like 
look-up tables of brands). We just need to consider the data lifecycle - 
is there an API? can we produce a tool? is the data good?

The architect in me wants a unique single ID for each entity, but 
postal_code was never it, and UPRN/ UPRI is too encumbered (ironically, 
to pay for the cost of maintaining it!)... so we might end up with OSM 
being the cross-reference database for the world's separated data - IF 
we can MAINTAIN all the foreign keys.


My plea is simply - think about the mapper standing in front of a thing 
in the rain, and adding tags. Think about the mapper correcting a 
spelling error in an armchair.

Do the tools exist to make the process easy and joyful?

Is it a slog through external data providers getting reference keys 
manually?

Does OSM get enough value from multiple external references (say aiding 
consistency, maintenance)?

No API, no tools, no prospect of tools, no maintenance, so consider no 
import?


We need to think of the lifecycle "cost" of the data collected and 
stored and MAINTAINED by OSM.


James
-- 
James Derrick
     lists at jamesderrick.org, Cramlington, England
     I wouldn't be a volunteer if you paid me...
     https://www.openstreetmap.org/user/James%20Derrick




More information about the Talk-GB mailing list