[Openstreetmap] Large data structure formats? (was: How do we handle large amount of GPS points?)
Matt Amos
matt at matt-amos.uklinux.net
Thu Nov 18 00:45:39 GMT 2004
On Thursday 18 November 2004 00:12, Petter Reinholdtsen wrote:
> So, I managed to get in touch with one linux user in Oslo, which
> has been driving around in the area with a GPS and a laptop for
> quite some time now. He got >7 million GPS coordinates, which he
> uses to create a bitmap map of Oslo. He was very positive to the
> idea of sharing this data set with us, and I got a copy of his
> current database.
cool! is it annotated?
> I guess my question is really, how should we handle such wast
> amount of points? Steve, do you want 7 million points from Norway
> in your database? Ready to handle it? How do you want it?
yes!
as far as i see it, the post/my-sql databases should be able to handle
this sort of load and search it pretty quickly - thats what databases
are for, right?
on the software level i forsee some sort of partitioning system. the
thoughts so far are:
1) stick to lat/long, with a standard quadtree. this presents some
problems as the element size is completely variable depending on how
far north you are, but there arent that many roads at the poles,
right?
2) use the projection of a cube onto a sphere (oblate, if you like) to
provide a 6-way quadtree. this has the disadvantage that it also has
badly-shaped elements near the cube corners and will have no explicit
connectivity between the six initial patches. however, it has no
singularities.
3) use a Dymaxion projection-based system for a hierarchichal
triangular mesh. this is also a quadtree, but more difficult to
handle both mentally and programmatically. it also has no
singularities and (certain formulations) guarantee no distortion of
element size, but distortion of element quality or vice-versa (it
appears that you can't have it both ways). but it multiplies the
problems with (2) above, as it has 20 domains rather than 6.
i've discussed the above with a few people, including Steve, Schyler
and Jo, but no real consensus was reached. are we comfortable with
all these approaches? does it really matter?
one reason it might matter is that each system allows (in increasing
order) the ability to find connected regions in decreasing time. this
will be important for doing track/road matching to find duplicates or
dependancies or minimal deviation lists, convex hulls.
personally, i prefer (3), but this may change if it proves difficult
to implement ;-)
cya,
matt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20041118/b4948aca/attachment.pgp>
More information about the talk
mailing list