[OSM-talk] Annouce : Planet incremental search

Nic Roets nroets at gmail.com
Sun Jun 10 00:10:28 BST 2007


The wiki is at http://wiki.openstreetmap.org/index.php/Gosmore

> and perhaps explain how you manage to deal with the large amounts of
> data involved? I know I could just get the source but I'd love to have a

I divide the world into tiles and hash the tiles into buckets. This
O(1) process is much faster than R-trees or B-trees provided your
tiles are big enough and you have a good way of organizing your
buckets. Within my buckets all half segments of the same "node" are
stored together, but this only helps with routing.

I eliminate nodes and work with "half segments" instead, and I believe
this configuration to be superior in all ways. It requires less
storage and is faster, but more significantly, the logic is simpler :
For example we currently have that lint problem where two  nodes can
be at the same coordinates, which will not occur with half segments,
not to mention the extra layer of indirection. You may well ask if a
user will be able to move a "node" with many half segments connected
to it with JOSM / potlatch. The answer is JOSM will pass such an
operation on to the database as a command "Move all half segments at
(x,y) to (a,b)". The only situations where this approach will yield
different results from the current one is (a) where two users moves
the same "node" and (b) where a user accidentally places one node
exactly on another. Both of these situations rarely / never occur.

Since gosmore only work with a static data set, I just sort the data
and didn't need to use trees (e.g. Btrees) for anything. Not only are
they complicated, but they also add a lot of storage overhead (parent
and child pointers).

I also use 32 bit integers. They provide 1 cm and better resolution.
The most frequent operation on coordinates are comparation (e.g. is
this node in this bounding box) and not operations like multiplication
at which doubles are sometimes faster than ints.

Together these to factors reduce the load on the disk.

Not having a client server architecture also helps. Something that
mapnik can do with.




More information about the talk mailing list