[Openstreetmap] Large data structure formats? (was: How do we handle large amount of GPS points?)

Matt Amos matt at matt-amos.uklinux.net
Thu Nov 18 00:45:39 GMT 2004


On Thursday 18 November 2004 00:12, Petter Reinholdtsen wrote:
> So, I managed to get in touch with one linux user in Oslo, which
> has been driving around in the area with a GPS and a laptop for
> quite some time now.  He got >7 million GPS coordinates, which he
> uses to create a bitmap map of Oslo.  He was very positive to the
> idea of sharing this data set with us, and I got a copy of his
> current database.

cool! is it annotated?

> I guess my question is really, how should we handle such wast
> amount of points?  Steve, do you want 7 million points from Norway
> in your database?  Ready to handle it?  How do you want it?

yes!

as far as i see it, the post/my-sql databases should be able to handle 
this sort of load and search it pretty quickly - thats what databases 
are for, right?

on the software level i forsee some sort of partitioning system. the 
thoughts so far are:

1) stick to lat/long, with a standard quadtree. this presents some 
problems as the element size is completely variable depending on how 
far north you are, but there arent that many roads at the poles, 
right?

2) use the projection of a cube onto a sphere (oblate, if you like) to 
provide a 6-way quadtree. this has the disadvantage that it also has 
badly-shaped elements near the cube corners and will have no explicit 
connectivity between the six initial patches. however, it has no 
singularities.

3) use a Dymaxion projection-based system for a hierarchichal 
triangular mesh. this is also a quadtree, but more difficult to 
handle both mentally and programmatically. it also has no 
singularities and (certain formulations) guarantee no distortion of 
element size, but distortion of element quality or vice-versa (it 
appears that you can't have it both ways). but it multiplies the 
problems with (2) above, as it has 20 domains rather than 6.

i've discussed the above with a few people, including Steve, Schyler 
and Jo, but no real consensus was reached. are we comfortable with 
all these approaches? does it really matter?

one reason it might matter is that each system allows (in increasing 
order) the ability to find connected regions in decreasing time. this 
will be important for doing track/road matching to find duplicates or 
dependancies or minimal deviation lists, convex hulls.

personally, i prefer (3), but this may change if it proves difficult 
to implement ;-)

cya,

matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20041118/b4948aca/attachment.pgp>


More information about the talk mailing list