[OSM-dev] A proposal for linking multiple ways into a virtual polygon

Tue May 15 20:10:59 BST 2007

On Tue, 2007-05-15 at 20:32 +0200, Frederik Ramm wrote:
> Jon,
> 
> > What I'm thinking of is essentially a doubly-linked list pointer in the
> > ways. Each way which forms part of a polygon includes the IDs of the
> > ways which join on each end, eventually forming a closed loop:
> 
> [...]
> 
> > Tools like Josm can carry on editing these ways without needing any
> > explicit support for these tags. 
> 
> Similar proposals - all containing the basic idea that way/segment/node 
> ids be used as tag values in other obejcts - have been made for a number 
> of other things. They are, of course, all a workaround for the lack of 
> object relationships, and they all run the risk of being ruined by a 
> tool that is ignorant of these tags (e.g. someone accidentally deleting 
> and then reconstructing a way, or someone splitting a way in two with 
> JOSM which would retain all tags, etc.)

The split coastal ways are already the possible victims of many of these
issues. Yes things will get broken, but using double link lists makes it
trivial to detect such errors during the mapnik import processing:
"found non-close polygon way id=123 refers to deleted way id=567".

Without such attributes all we know is: "unable to find any way which
joins to the end 123 ... maybe it was never joined to anything, maybe it
was, but the other thing was deleted..., i've no idea what it should
connect to"

Adding the meta data helps make the joining of ways much more robust
IMO. The JOSM splitting with duplicated tags is also trivial to detect.

> You know that I don't know a lot about PostGIS and what you're doing in 
> your database so forgive me if this is way off the mark. But, given that 
> you convert the whole planet file into a relational database with all 
> that super GIS stuff and spatial indexing and so on, is it really so 
> difficult to do something like - wildly fantasizing pseudo-SQL here:
> 
> SELECT w1.id,w2.id FROM ways w1, ways w2 WHERE 
> w1.last_node_of_shape=w2.first_node_of_shape AND w1.tags=w2.tags
> 
> and then combine these pairs into one linear feature if w1 != w2, or 
> make an area feature from them if w1 == w2, and repeat the process until 
> none are found?
> 
> I can see how something like this would be difficult if one tries do do 
> it *while* importing the planet file, but once you have SQL at your 
> hands, it sounds as if it should be rather easy, and quick even.

Yes, the "while importing" is one key issue. The code currently does all
processing while streaming the data out to the DB. It does not query the
DB at all during the export. One reason for this model is to allow for
other backend instead of a DB (e.g. shapefiles, data stats
generation ...)

It should be possible to run a post-processor over the DB. The 'tags
being equal' test however would be very fragile. I think instead it
needs to check just a few important key/value pairs (natural=coastline
for example).

> If theoretically working on all 600.000 ways is too much strain on the 
> database, we could still introduce a tag saying "mapnik:recombine=true", 
> and stick that on every way that should, if possible, be merged with 
> adjoining ways.
> 

600k on its own isn't an issue, it is algorithms which can be O(n*n)
operating on 600k items which is an issue.

The obvious approach of trying to search the raw node/segment data is
pretty bad. Suppose I have a way, I can identify the two end nodes
(A,B). I then have to search every single segment(10M) to identify ones
which reference A,B. I then have to search the ways(600k) to find those
referencing the segments. 

Yes, this can be speeded up easily by generating indexes for traversing
node->segment->way but I fear this would add significantly to the memory
usage (probably doubling it). The osm2pgsql code has already been
criticised for excessive memory usage.

Adding next/prev pointer looks like the correct interim solution to me
since it provides a simple solution to joining ways for any tool.

If we fix the API and tools to handle these large polygons at some
future data then it is trivial to use these pointers to re-generate true
closed polygons again in the DB.

	Jon