[OSM-dev] Possible GSoC project: tag/area monitoring service

Wed Mar 7 04:11:40 GMT 2012

> First, a longstanding wishlist item for OSM has been "data tiles",
> that is the API data, split into preset sized areas (eg z14), which a
> client could call. This may not seem reelvant to your project but
> you'll see why it is soon.

This was actually part of my original motivation for proposing this project -- in my 2010 GSoC project, I used bbox queries to load data in tile-like sections, but as I mentioned this turned out to be very slow. Data tiles seem like they could speed things up for that sort of use. Ideally, the work involved in accessing a data tile would be comparable to accessing an image tile. Also, it seems easier to cache data addressed by tile than it is to cache the results of arbitrary bbox queries.

I'd also be interested in working on data tiles -- is that in itself a reasonable project idea? My hope is that if either of these ideas are things people have been wanting for a while, they'll want to use them, and that if a project has people using it, it would be more likely to be around after the summer.

One thing I was wondering about -- how do you choose a tile size to minimize both the number of accesses (larger tiles) and the byte size of tiles (smaller tiles)? Some areas have a much higher density of data than others. Perhaps some kind of quadtree-type approach could be used, where tiles are split if they have high density?

The ideas you suggest for streaming-type updates on data tiles are very interesting. If you were writing an editor, you could be more certain that you were displaying the most recent data without having to reload all of it.

> While you could use Changepipe to make arbitrary polygons and then
> stream the changes, IMHO this is not as generally useful as one might
> imagine. Network hiccups alone can mean that it's possible to miss an
> event. And arbitrary polygons become "complicated" as the number of
> queues can be large.

I hadn't thought about using arbitrary polygons to specify areas as it seemed too complex -- would there be much call for that? I assume the use cases would be things like keeping track of updates to a city (the area of which isn't always conveniently specified as a bounding box).

> By splitting the areas up, you can now take a changeset and know which
> areas (tile) it effects. And then each client can simply subscribe to
> an area (tile). You've greatly simplified the problem, whether you
> allow for arbitrary shapes (one shape -> many tiles) or 1:1 tiles to
> connections.
> 
> Now, to your original question... Another advantage of "tiling" the
> data is you can easily do both. Each tile can have a list of changes
> associated with it. If you tried to do this on arbitrary polygons,
> it'd get difficult very quickly.

This makes sense, as I guess it means there are fewer bins to put things in when an update needs to be sent out to clients. (You only have to do the work once if several clients are looking at a particular tile.) And, if a client really did want to look at an arbitrary polygon, maybe it could "rasterize" the polygon into a list of tiles.

For people who are interested in updates to tags, a similar approach could be used, perhaps -- in that case I guess a tile would be analogous to a particular value or set of values for a tag.

-- Michael