[OSM-dev] Area support

Brett Henderson brett at bretth.com
Thu Jul 10 04:09:44 BST 2008


Stefan Keller wrote:
> Agreed. An area (or polygon) simply is a basic geometry type; it's a 
> fact in practice and computational geometry. 
>  
> The actual XML encoding of ways never really was according to best 
> practices and will not scale. This is because even for ways we have to 
> read in all nodes until the end of the stream/file before any way can 
> be created. Now, when we misuse (application oriented) relations to 
> encode areas, processors are again forced to read in *all* relation 
> instances before any area can be instantiated... Can't we do that 
> better? (the crazyness would be complete when one would 'consequently' 
> encode areas *and* ways as partially ordered relation instances which 
> would point to nodes).
>  
> I would take this discussion as a chance to enhance the OSM/XML 
> encoding: Ways should be encoded with nodes (coordinates) embedded, 
> and areas (polygons) ought to be encoded with one way as outer 
> boundary and zero or more inner ways (boundaries) - embedded. I would 
> even differentiate areas which overlap and areas which don't (but this 
> is more on the conceptional and application modelling level and makes 
> no difference in the encoding). Look on the simplicity and usability 
> of such an XML encoding below...
> Stefan
A few comments:

Are you suggesting that we should represent areas as a set of ways?  
Would this mean you should modify the xml structure below to include the 
way id on each way element?
This wouldn't be advisable.  It would require each set of ways to be 
closed which would be hard to enforce because you'd end up having to 
check if they're a member of an area.  Perhaps you didn't mean this 
though and way was just a convenient xml element for grouping nodes 
within the xml.

I'm curious what value the geometry xml element adds but this is just 
semantics so I'm not too fussed.

I've been thinking about the inclusion of node lat/lon information in 
the file.  Initially I thought it was a good idea but on further thought 
I'm becoming convinced that it's not the way to go.
PROS
Greatly simplifies stream processing of large osm files avoiding 
temporary (memory or disk-based) storage.
CONS
It adds redundancy to the file.
It increases the size of planet dumps.
Post-processing can achieve the same result and can be done efficiently 
with an intermediate database using changesets.
It will add significant overhead to the existing planetdump and osmosis 
changeset processes.

It does have some immediate advantages but at the end of the day it is 
an optimisation and I believe there are other perhaps more effective 
ways of achieving the same result.  If we can move towards the use of 
databases and changesets instead of complete dump files we will scale 
more effectively.  In other words, the reason we have this issue is 
because the data is growing too large to hold node lat/lon information 
in memory and we want the main database to do the area-way-node 
correlation for us.  I'm suggesting that if we use a database locally 
this problem goes away and we gain the added advantage of being able to 
work with changesets instead of download complete snapshots every time 
we need an update.  We should be minimising the load on the primary 
database and moving non-essential (ie. non-editing) work offline.

Hope that doesn't come across too negative.  It's definitely worth 
having this discussion, there may be things I haven't considered.

> Example of a enhanced OSM data encoding of an area and its boundaries 
> (ways) as proper property types:
>  
>   <area id="4304746" timestamp="2008-03-25T21:31:01+00:00" 
> user="anonymous">
>     <tag k="landuse" v="water"/>
>     <tag k="created_by" v="JOSM"/>
>     <tag k="natural" v="water"/>
>     <tag k="name" v="Lake of Zuerich"/>
>     <geometry>
>       <outerboundary>
>         <way>
>           <node lat="47.23439" lon="8.82187"</node>
>           <node lat="47.23411" lon="8.82362"</node>
>           ...
>         </way>
>         <way>
>           ...
>         </way>
>       </outerboundary>
>       <innerboundary>
>         <way>
>           <node lat="47.23411" lon="8.82111"</node>
>           ...
>         </way>
>       </innerboundary>
>       <innerboundary>
>         <way>
>           <node lat="47.23499" lon="8.82199"</node>
>           ...
>         </way>
>       </innerboundary>
>     </geometry>
>   </area>




More information about the dev mailing list