[OSM-talk] Import guidelines proposal update

Sat Sep 22 07:46:50 BST 2012

Paul Norman wrote:
>> From: Lester Caine [mailto:lester at lsces.co.uk]
>> Subject: Re: [OSM-talk] Import guidelines proposal update
>>
>> who last edited an object! ). Where the import HAS nice unique object
>> identifiers things are a lot easier, but raw vector data like the French
>> import, and I think the Spanish data you are talking about CAN still be
>> 'diffed' against earlier imports, and result in perhaps new data that
>> can simply be imported, or perhaps an overlay that identifies conflicts
>> that need a human eye. Isn't it better to spend time working out a GOOD
>> way of using the data going forward rather than having to manually merge
>> the whole lot again in a couple of years time ... and every couple of
>> years.
>
> My thoughts on how to handle this for data with persistent unique
> identifiers without adding those as tags is to

******
> a. Record the correspondence between source ID and temporary pre-upload
> negative OSM ID
>
> b. Record the correspondence between pre-upload negative OSM ID and OSM ID
>
> c. Combine for a correspondence between source ID and OSM ID, and save this
******
EXCEPT - that requires ALL the data from the external import to be loaded in 
order to create the OSM ID which may not be a bad thing? ... BUT
Part of the 'preprocessing' before ever uploading the import would be to 
identify which objects are going to be uploaded and which not, so you need to 
create an 'id' initially related to the data source? That is providing that the 
data source is actually identifiable data.

What I had not considered up until now is if the data source is simply a raw 
vector file with version of a paper map, then while the individual lines could 
be 'imported' the data is almost useless until it has been 'identified'? You may 
just as well simply trace? But even here all is not lost since one can still 
pre-process the data and provide the link back as to which lines have been 
copied and which not. In which case the OSM ID provides additional data back to 
the source, but I doubt that there is any value simply importing millions of 
lines segments directly into the main database? This has to be a secondary 
staging area to handle that data?

> d. When updating, identify objects that have changed or been added to the
> source
>
> e. For changed or deleted objects if the OSM object was last edited by the
> importer's import account, upload a new version reflecting the changes.
> Objects that have been edited by a person will require manual intervention,
> like now
>
> f. Handle new objects like before
> 	
> g. Identify objects deleted in OSM and check these, then submit corrections
> to the source.
>
> The one case this doesn't handle very well is POIs that have been changed
> from a node into a way.
>
> I'm going to be working on implementing this in a limited way for updating
> addresses locally. Addresses are different because the address should be
> unique in the city.

While the UK 'address database' can't be uploaded freely yet, I have been slowly 
importing data manually, and it just irritates that every building has 
duplicates of much of this data. I know a few attempts have been made at 
relations and the like to group stuff, but as I have said in the past, isn't now 
the time to provide a mechanism that uses 'lookup tables' for some of this which 
will automatically simplify what is stored in the tags against each object? An 
'address' in the UK only needs to be the 'property id' - house/flat number or 
name - and the 'postcode'. Everything else can be provided by a 'lookup' on the 
postcode reference. Now this ACTUALLY does not work simply because the 
'postcode' has too many edge cases where you need additional information to 
provide 'street'. That is why the nlpg data does not use it and simply provides 
a reference to it deep inside. It provides a street gazetteer with a clean 
reference number for each street ( and in theory a 'way' for the physical 
location, but in most cases this is just a couple of 'end points' :( ) This is 
the sort of process that could simplify a LOT of the micro/macro mapping 
problems that are now building up, since a world wide 'street gazetteer' is the 
base for all of the routing programs? And a top level map in its own right? All 
of the problems of 'turns relations' would be managed in the 'street gazetteer', 
while the underlying map can display all of the pretty stuff such as the grass 
verges, footpaths and the like?

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk