[OSM-dev] Suggestion about design of OSM data structure (take out segment component)

Thu Jul 26 17:22:47 BST 2007

Hi Frederik Ramm,

  You two main reasons to keep the segment may not valid.

  <<<
  1. Currently, we can have branching ways (e.g. Y-shaped). These need 
to be split into several ways when we remove segments. This is 
possible but probably difficult to automate fully (e.g. if you have a 
main road with lots of little branches, you want the branches to be 
split off and the main road to remain in one piece!).
 >>>

  Your argument is not valid because you are assuming that branching ways (Y-shaped) are 2 ways direction but they may be one way direction in the real life!

  For example: if we have an one-way direction high way that has 2 branches: 1 branch go in the main high way and another branch go out from the main high way, how you use way's segments to model this way without break it to 2 ways? Remember we do not have tag oneway=yes for each segment, we only have one tag oneway=yes for each way!

  Please see the attached picture for more details.

  <<<
  2. Currently, we can have non-contiguous ways, which are sometimes 
used to model "areas with holes" (say you have a forest with a 
clearing, then the forest will be one "ring" with clockwise segments 
and the clearing another "ring" with anticlockwise segments and both 
joined into one way). You cannot model this with simple ways alone; 
another way would have to be found for areas with holes in them.
>>>

  Without segments, I still can use simple way to model area with hole.
  For example: I have 11 nodes. Node: 1, 2, 3, 4, 10, 11 form the outer ring clockwise and nodes: 5, 6, 7, 8, 9 form inner ring anticlockwise. Two rings joined at node 4 and 5.

  So, to model the above area with hole, I can use simple way like this (1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 4, 10, 11)

  Please see the attached picture for more details.

  For your information, Garmin is using this method to model polygon with hole. They have only 2 data types (polygon has same structure with polyline):

  point(label,type,lat,lon)
  polyline(label,type,lat[0],lon[0],....,lat[n],lon[n])
  polygon(label,type,lat[0],lon[0],....,lat[n],lon[n])

  Please bear in mind two things:

  1. When you design data structure for map, you need to save any space as much as possible because map data are very huge.

  2. Data structure must be simple enough so that other program can process your data. If your data structure is so complicated (nodes, segments, ways) other program may not able to calculate shortest way for route planning or the program will spend a lot of time to calculate the route --> slow respond to user.

  I hope osm foundation will consider my arguments and take out the segment component from planet.osm xml file.

  If we can implement data structure like this:
  nodes (id, lat, lon, tag)
ways (id, nodeid[0],nodeid[1],...,nodeid[n], tag)

  Then we can save a lot of space in planet.osm xml file and other program will process osm data more easily (trim, convert...)

  Regards,

  Frederik Ramm's arguments:
<<<
  Chien,
  it is well known and long discussed that removing segments will 
make things easier.
  You are, however, slightly underestimating the complexity of the 
task, because of two reasons:
  1. Currently, we can have branching ways (e.g. Y-shaped). These need 
to be split into several ways when we remove segments. This is 
possible but probably difficult to automate fully (e.g. if you have a 
main road with lots of little branches, you want the branches to be 
split off and the main road to remain in one piece!).
  2. Currently, we can have non-contiguous ways, which are sometimes 
used to model "areas with holes" (say you have a forest with a 
clearing, then the forest will be one "ring" with clockwise segments 
and the clearing another "ring" with anticlockwise segments and both 
joined into one way). You cannot model this with simple ways alone; 
another way would have to be found for areas with holes in them.
  Bye
Frederik
  >>>

  Chien Nguyen Khac's arguments:
  <<<
  I have studied osm data stucture.
osm data stucture contains 3 main components (nodes,segments,ways):
nodes (id, lat, lon, tag)
segments (id, fromnodeid, tonodeid)
ways (id, segmentid[0],segmentid[1],...,segmentid[n], tag)

Actually, to store a map data we need only 2 components: nodes and ways.
nodes (id, lat, lon, tag)
ways (id, nodeid[0],nodeid[1],...,nodeid[n], tag)

I wonder why osm foundation introduce segment component in between node and way component? So I read the osm's wiki to find the answer.
osm foundation said that they need to use segment component because ways can share segment!
Well, it may be true in theory but in the real life how many segments have been shared among ways? I guess not so many.

So, by introducing segment component in between node and way to save some spaces (sharing segments between ways), osm foundation make osm data structure design waste a lot of space when they output the osm data to planet.osm in xml format!

Users (who use planet.osm like me) need to write our own program (perl, java or whatever language) to process a 4GB planet.osm text file. And this 4GB will grow up very fast in the near furture.

Furthermore, the current osm data structure design (nodes,segments,ways) make thing very dificult to process planet.osm xml file.

Forexample: if I want to get the data from London city from planet.osm, what I need to do?
My program need to do the following tasks:

1. Open and read planet.osm xml file
2. Trim all the nodes that are outside London's bounds --> write result to file
3. Trim all segments that are outside London's bounds --> write result to file
4. Trim all ways that are outside London's bounds --> write result to file

Task 2 is very simple, just check the node's lat and lon within the bounds --> after task 2 finish we have a set of nodes within bounds
Task 3 is a little bit complicated, we need to check segment's fromnodeid and tonodeid against the set of nodes within bounds -->  after task 3 finish we have a set of segments within bounds
Task 4 is extremely complicated, we need to check way's segment against the set of segments within bounds.

If we can implement a new data structure like this:
nodes (id, lat, lon, tag)
ways (id, nodeid[0],nodeid[1],...,nodeid[n], tag)

Then we can save a lof of space in planet.osm xml file and it is also easy for other programs to process planet.osm file.

Other programs just need to do fewer tasks:
1. Open and read planet.osm xml file
2. Trim all the nodes that are outside London's bounds --> write result to file
3. Trim all ways that are outside London's bounds --> write result to file

The program will require less memory and run faster because it does not need to allocate memory to store and process segment components.

  >>>

---------------------------------
Be a better Globetrotter. Get better travel answers from someone who knows.
Yahoo! Answers - Check it out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070726/3f4a3079/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: model ways without segment.PNG
Type: image/x-png
Size: 12206 bytes
Desc: 812827630-model ways without segment.PNG
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070726/3f4a3079/attachment.bin>