[OSM-dev] Possible GSoC project: tag/area monitoring service

Serge Wroclawski emacsen at gmail.com
Wed Mar 7 15:49:57 GMT 2012


We could take this off-list but I think this may still be of interest
to the general community.

On Tue, Mar 6, 2012 at 11:11 PM, Michael Daines <michael at mdaines.com> wrote:
>> First, a longstanding wishlist item for OSM has been "data tiles",
>> that is the API data, split into preset sized areas (eg z14), which a
>> client could call. This may not seem reelvant to your project but
>> you'll see why it is soon.
>
> This was actually part of my original motivation for proposing this project -- in my 2010 GSoC project, I used bbox queries to load data in tile-like sections, but as I mentioned this turned out to be very slow.

There are a number of ways to do this intelligently. I was going to
write up a very naive prototype that had no brains at all, and here's
what my approach was going to look like (and I'll do it if there's
interest):

Write some code to query jaxpi for bounding boxes in Python based on tile name.
Use this and write "Data tile" support in TileStache. I'd store cached
tiles in Redis (for reasons that become apparent in a few sentences).
I'd use the parsing/storing bits of Changepipe to tell me which tiles
are effected by a changeset (even though I believe it uses the
changeset's bbox, which is oftentimes wrong).
Since Changepipe is already using Redis, using Redis for the tiles makes sense.

And then the issue would be how to hack in some code for the
websocket/stream/whatever. This seems like it'd be relatively simple
using Redis pubsub and something like gevent, but I haven't looked
into it.

The right answer would be to keep a local copy of the database and
then update it as necessary. I believe Ian Dees has a copy of some
MongoDB code that uses quadtile to index OSM objects (I'm very fuzzy
on the details). (Update, Ian sent me this url, but I haven't taken a
look: https://github.com/iandees/mongosm/commit/c46c2081edde0b3b2b0446dd06d5ef02b292631c
)

Then as objects would change, you'd be able to update the tiles.

> I'd also be interested in working on data tiles -- is that in itself a reasonable project idea?

I think that would be welcome. Especially if done well. My naive
approach would be slow, but if you used a different approach that
didn't keep hitting external servers on every update, it'd be a very
nifty project indeed.

> One thing I was wondering about -- how do you choose a tile size to minimize both the number of accesses (larger tiles) and the byte size of tiles (smaller tiles)? Some areas have a much higher density of data than others. Perhaps some kind of quadtree-type approach could be used, where tiles are split if they have high density?

That'd certainly work. I'd started with a naive approach of "If I only
have one zoom level, things are easy", and then you just accept that
some areas are dense, and others not. At the same time, there won't be
as much demand for low density areas.

There's certainly value in cleverness and not transmitting too much
data, but there's also value in simplicity for clients.

I think with compression or binary formats like pbf, the need for
cleverness is reduced since there's overall less data transmitted.

- Serge



More information about the dev mailing list