[Imports] New module to merge-sort imports over time (osmfetch python)

Fri Aug 26 09:56:08 UTC 2011

On 26.08.2011, at 8:34, Bryce Nesbitt wrote:

>> On Thu, 25 Aug 2011, Bryce Nesbitt wrote: 
>> 
>>> I disagree here.  If the external source is "ground truth", then that's the data that should take precedence.  A car share operator for example won't want a disused location shown, and may well make that a requirement of permitting the merge-sort.  An import from a car share reservation system is definitive ground truth. 
>> 
>> A car share operator, or anyone else can go in and mark a location as closed or delete it but the operator isn't automatically considered more authoratative than any other mapper.  For example let us say a fast-food chain imports a database of their restaraunt locations and marks some of them as wheelchair=yes but I go visit the location and feel it doesn't qualify for that tag. The operators database can't complain ground truth or authority over someone who has actually visited that location and an automated script shouldn't 'undo' my change every month.
> 
> That's the cool thing about the proposed approach.  Who is the authority for each tag is scriptable.  wheelchair=yes can (and would) be mastered in osm. Exact lat/lon would always be mastered in osm.  Heck the script could even set an OpenStreetBug if it cared to resolve a minor-tag discrepancy ("Chain-store says toilets=permissive, local mapper says toilets=no, who is right?").  But in general I'd leave all those tags to humans.
> 
> But if the fast-food chain claims a store is closed, well... I'd go with that as first cut.
> Similarly if fast-food chain says a store is now open.
> If the car-share reservation system says there is now a "Prius" and a "Batmobile" for hire, I'd go with that over older community data that is likely stale.
> 
> The car share data produced by the community process was highly spotty.  The reservation system data is complete.  But you can have it both ways: osm contributors can add all sorts of tags (description, photos, etc.) and the merge process will keep the best of both on the same node, with full history.
> 
> The automated tool in question already shows the human operator the diff: so a human is still in control.  Perhaps it could be extended to detect and flag any potential edit wars (e.g. same tag 'corrected' twice)?  Would that satisfy the objection?

What is this "care share location" really? Some special spot, or co-located with other amenities (gas stations, bus stations, buildings etc)? If the objects are really autonomous nodes, connected to nothing, then your solution could work well. I can imagine similar situation with other very specific datasets: say elevation info (DEM), half-virtual objects like geocaches. 

If the spots are shared, then you have to merge it with existing (possibly conflicting) tags, locations, you link points with ways etc, how exactly it would work with your script. 

 I cannot resist to propose also OpenMetaMap solution for your case:

a) if spots are autonomous:
1. Car share operator publishes their data as OSM file. They need to do it for import anyways.
2. They put their URL to OMM data directory
3. Users (like OSM main Mapnik renderer) will find the latest situation it from there, and will add it as a data layer. I guess is this is your main reason to import it.

No data duplication, sync etc needed. 

If the dataset is linked to existing OSM objects then it would be more complicated:
1. and 2. steps - same as above
3. you open JOSM and download both datasets (OSM and yours)
4. you merge data: select 2 points, click "Edit > Merge points". Resolve tag conflicts if found, check location. This will create OMM Links for you. You do exactly same amount of actions as you need to do with manual merging anyways.
5. Save data: OSM object will be updated if you moved point or changed tags, otherwise not. Mostly you save links to OMM.
6. Users will take data from OSM, your database and OMM Links, and merge them on the fly. Broken links are not rendered, just like with your import/sync script

Advantages:
 - maintenance-free, no need to re-run sync scripts by data provider. Data gets rotten not only because there care no manual edits, but also because sooner or later you do not run the script anymore, so the foreign key tags in OSM will be outdated. 
 - Principal difference is that you will give full control over data links to the OSM community. If you are not there to update it, they will; at least if the data is really relevant for the community.

Disadvantages / limitations:
- much more tools/code needed than one Python script. Cannot be done today.
- one more trouble, decision to make, for OSMer: should my contribution added also to OSM, or kept in external dataset. There should be best practice guidelines for it, or maybe could make the decision before user. For the car sharing operator it does not really matter: they would get the contribution either way: from OMM or from OSM (with help of OMM) database.
- external API will be unavailable after some time. Maybe it takes years, but it will happen. For this OMM shall have "persistent cache" (archive) option, if data source allows it.

Jaak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports/attachments/20110826/d77a1f95/attachment-0001.html>