[talk-ph] bulk editing address info in POIs

Mike Collinson mike at ayeltd.biz
Sun Aug 16 13:55:35 BST 2009


At 03:55 PM 13/08/2009, Eugene Alvin Villar wrote:
>Here's my two cents regarding this:
>
>I don't favor using addr:city, addr:village, is_in to specify where a POI is. Here are the cons:
>
>1. Duplication of info with admin borders (and potential mismatch issues)
>2. Increased data size with respect to tags (which makes planet dumps larger)
>
>On the other hand, here are the pros:
>
>1. POIs are easier to filter by place than the alternative which is to do bounding polygon calculation, which is more computationally intensive. This calculation can be mitigated somewhat by doing pre-processing of the data just before the data will be used (e.g., as an additional step to making Garmin maps.)
>2. Identifies where a POI is in the (hopefully temporary) lack of boundary data.
>
>Regardless, addr:street is essential since this is very hard to infer from the data without it.
>
>
>Anybody else have other thoughts?

In my own mapping and having an interest in preparing OSM data for first generation gazeteer and search software, I generally go for "the more the better" broadly for the reasons Eugene outlines.  Redundancy is heresy in database programming courses but I think there is an assumption that data is put in under strict rules and in a  controlled environment. For us, I think redundancy (partial duplication but from different sources and methodologies) is actually a good thing ... latter pruning is not impossible.  Perhaps in two or three years time, boundary data and the software to easily process it will be highly available but for now, I say leave 'em in!

Size of planet dumps. Yes, a concern, especially when you are trying to do a dial-up download, something the Europeans forget.  But POIs may number thousands in an area but the ways in the same area may have hundreds of thousands of nodes, especially if over-digitised. Taking into account all the XML tagging wrapping a node, the size of a POI is not that much bigger than a  raw lat,lon node.  The size of planet dumps is going to get too big anyway, I kind of see value in forcing the issue sooner not later.

I have, by the way, now switched to using explicitly identified is_in:* tags using the place= values where possible and user defined value where it gives some local benefit.

is_in:country, is_in:state, is_in:city,  is_in:town ...
is_in:island, is_in:sea
is_in:valley, is_in:barangay, ...

I am interested to see whether we can collect enough points to generate reasonable boundaries from points rather than the other way around.

Just my thoughts!

Mike 






More information about the talk-ph mailing list