[OSM-dev] Proposal: Database accelerator and dirty tile marker based on simple algorithm.

Sat Sep 16 18:22:57 BST 2006

[inline]

On 16/09/06, Nick Hill <nick at nickhill.co.uk> wrote:
>
> Hello Nigel
>
> I remember reading your quadtree post.
>
> You could, for example, use the two most significant bits to represent
> each quadrant, then the two next most significant bits to select  the
> next level quadrant and so on. With a 32 bit int, you can do this 16
> times, dividing the planet into 4Bn portions, much like the tiles I
> suggested.

Yes; whether you use a pair of bits or a letter doesn't make any difference
to the algorithm.

I think using an int and bits scheme would be wildly more efficient than
> using characters.

It is in space, but not neccesarily in time. When I tested on a couple of
databases a while back, char(16) indices significantly outperformed integer
range checking. (Where a text index you do something like 'select * from
nodes where qt like 'D%' vs 'select * from nodes where qt >= 12 and qt <=
15).

The important thing is having a tile address. However it's done underneath,
I think it's easier for users to be able to look at tile addresses not range
(so we ought to be able to do something like /api/map?tile=dcb rather than
/api/map?tile=a,b). I believe google use something similar for some of their
tile data.

Extending the idea I proposed, If each object was associated with a
> tile, each object would only need an additional single 32 bit int to
> locate to within 1cm. The algorithm to derive lat/lon would be extremely
> computationally simple and fast- a few steps of a shift register. In
> fact, faster than the transport of data that they will derive.

No - objects potentially need many additional points. A segment going from
tile(a) to tile(b) may traverse tile(c), so this information needs to be
stored. (In this instance, a and b are stored by nature of the node, it's
only the additional tiles traversal (or enclosure for areas) information you
need to store).

I haven't yet considered your quadtree proposal in enough depth to fully
> compare the two approaches, but there are many similarities.
>
> My proposal could potentially be mimicked by MySQL by selecting partial
> indexing. For example, indexing only the first 16 bits of each lat and
> lon.

I don't know MySQL that well (and its somewhat inconsistent performance
characteristics); if it's similar to index file partitioning in Oracle,
there may be some gains to be had.

Of course, once (if) we get tile addresses in, there's nothing to stop the
map server caching API queries and not going back to the database at all,
since it will be able to determine whether anything stored in it's local
store has changed. A straight hashmap of tile address->query result xml
would do this...

Nigel Magnay wrote:
> > yes - use tiles, but I'd rather use quadtiles, as they have nicer
> > properties (particularly you can fetch the super-tile of 4 child tiles
> > incredibly easily. Calculating which tile a point is in is not very
> > hard; I've got a bit of java lying around, but it's easy enough to
> > understand.
> >
> > Incidentally, though I didn't mention it at the time, this scheme is
> > particularly good for databases that support data file partitioning.
> >
> > Here's the post I sent a while ago:
> >
> >  > [sniiip]
> >  > I'd be interested to hear from computer scientists about fancy
> > algorithms for
> >  > spacially-sorting the data, doing indexes, etc. First person to
> mention
> >  > Hilbert space gets a nerd of the month award.
> >  >
> >
> > Quadtrees. We need to be doing this in OSM in general, and it would
> > make things much simpler.
> >
> > To illustrate - divide the world into 4 quadrants. Call the top-left
> > A, top-right B, bottom-left C, bottom-right D. Then for each quadrant,
> > do the same. thus the grid is now
> >
> > AA AB BA BB
> > AC AD BC BD
> > CA CB DA DB
> > CC CD DC DD
> >
> > each time you repeat this process (and you can do this as much as you
> > like), you halve the area in each addressable tile. For around 13
> > characters, you get (IIRC) 100m of resolution if you started with the
> > entire globe (may be wrong, memory is hazy).
> >
> > So, just thinking about nodes, we store along with each (long, lat)
> > coordinate the quadtree address. Instead of querying, say,
> >
> > select * from nodes where lat>? and lat<? and lon>? and lon<?
> >
> > we just do (say)
> > select * from nodes where quadtree='AAABDBA'
> >
> > rather neatly, if you wanted to zoom out to the next level of detail,
> > you can do
> >
> > select  * from nodes where quadtree like 'AAABDB%'
> >
> > Which will get any node in AAABDBA, AAABDBB, AAABDBC and AAABDBD. This
> > is a single index lookup, and is very fast.
> >
> > This also solves another sticky problem. Consider segments (ways are
> > just the same on a larger scale). You have a segment going from AB to
> > BC. Using the current 'get nodes in area, get missing nodes in
> > segments' method, if you fetch the area AD, you'll miss the fact that
> > there is this segment going through it. By storing for each
> > way/segment the list of tiles it intersects, the segments query is
> > also simplified to 'select * from segments where quadtree like 'xyz%'.
> >
> > This is particularly important for areas - for example, you could be
> > viewing a tile in the middle of a forest whose edge does not actually
> > intersect with your tile at all.
> >
> > This makes the map query somewhat easy, just a sequence of
> > select * from nodes where quadtile like ?
> > select * from segments join segment_quadtile on
> > segment.id=segment_quadtile
> > .quadtile where segment_quadtile like ?
> > etc., etc.
> >
> > Using quadtree addresses is also useful for external applications. Say
> > OSM had an RSS feed of changes that listed which quadtiles have been
> > changed. Map renderers can thus use this information to invalidate
> > their cache for just that tile, and re-render, without having to check
> > for changes each time. I could 'register' for information if anything
> > changed in a list of quadtile addresess (say, those covring oxford),
> > and it would be very easy to calculate when something I would be
> > interested in had happened.
> >
> > This isn't very complicated and could be done relatively quickly. I
> > have most of a java implementation buried on my harddrive somewhere...
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20060916/2636e55c/attachment.html>