[OSM-dev] Cartagen - client-side vector based map renderer, dynamic maps

Fri May 8 20:00:28 BST 2009

Moin,

On Friday 08 May 2009 20:04:48 you wrote:
> > * The proxy receives XML from the api or xapi server. Currently it
> > requests the full dataset.
> > * Then it removes unnec. tags (like note, fixme, attribution and a
> > whole bunch of others that are not needed for rendering). Some of
> > them are very minor, but 10000 nodes with "attribution=veryvery
> > long string here" can make up like 40% of all the data, and just
> > clog the line and browser :)
>
> Yes, I'm thinking of trying to cache locally but still request
> changesets if the ?live=true tag is set... caching locally is great
> for more static data but for the live viewer, I'm trying to not use
> caching, but increase efficiency in the requests.

I fear loading data live from the API server is just not feasible, 
unless you:

* only load diffs (minute-diffs?) and update your already cached at the 
proxy data with that. OTOH I read that importing a one-hour diff into a 
postgres database can take 40..70 minutes, e.g. depending on load you 
might not even manage to update your DB with the diffs fast enough...
* invent an API server that is about 1000 times faster :)
* do never zoom out from level 18, anything below will request so much 
data that you can't get it live :)

Currently I consider "live-view" not an achiveable goal, I am happy if I 
can render data that is about 1 day or so old.

> * The data is then pruned into (currently 3) levels and stored in a
>
> > cache:
> >  * level 0 - full
> >  * level 1 - no POI, no paths, streams, tracks etc. used for zoom
> > 11 * level 2 - no tertiary roads etc. used for zoom 10 and below *
> > The client is served the level it currently requested as JSON.gz.
>
> Great, this is what I'm working on too. I'm thinking a ruleset about
> what features are relevant for what zoom levels could be something to
> work together on? I was also thinking of correlating tags with a
> certain zoom level. But maybe each tag should be associated with a
> range of zoom levels, like "way: { zoom_outer: 3, zoom_inner: 1 }".
> Thoughts?

My rules do have a minimum zoom level, smaller than that and they are 
not rendered. The levels are inspired by the osmarenderer and mapnik 
outputs, but I moved a few of them down so you can render really high 
resolution maps.

However, the pruning at the proxy is something else and not connected to 
that. For instance, somebody might not want to see tertiary roads on 
level 13, but others want. So I make sure that I only prune out data 
that is never be able too seen on that level. E.g. a conservative 
pruning.

Also, about 90% of the data-pruning is about removing unwanted data 
(like "note=blah" :) and not about the smaller zoom levels because 
currently it is simple not feasible to render below 10 and even for 
zoom 10 you need a really really beefy machine and a long wait time....

> > * There are three servers in the list (api.openstreetmap,
> > xapi.informationfreeway and tagwatch) and a lot of them do not
> > complete the request (internal error, not implemented etc. etc.).
> > It can take a lot of retries to finally get the data.
> > * Even when you get the data, it takes seconds (10..40 seconds
> > is "normal") to minutes - upwards to 360 seconds just to serve one
> > request.
> >
> > So currently all received data is stored in the cache for 7 days to
> > avoid the very very long loading times.
> >
> > Ideas of fetching the full dataset and pre-computing the cache
> > simple don't work because I don't have a big enough machine and no
> > big enough online account to store the resulting JSON :(
> >
> >
> >
> > Also, somehow processing 150 Gbyte XML into JSON will prove to be a
> > challange :)
>
> So I'm having the same problems with the APIs. The standard 0.6 api
> has been pretty good but of course it serves XML, not JSON. The xapi
> is not very responsive to me, it seems. 

Neither for me, but the API server is very slow, too. It seems it can't 
manage to send me more than 17Kbyte/s (but maybe it is bandwidth 
limited?).

> I thought parsing XML in JS 
> would be molasses, 

When I tried it, it used ungodly amounts of memory (because the data 
structure is not usefull for rendering and it contains so much cruft), 
and I also never managed to extract the actual node data for rendering 
from it...

> so if you're interested, we should put up our own 
> XAPI or custom api off the planet.osm file, and send JSON?

Yeah, that was my plan for the near future :) For now I am happy with my 
proxy as there is quite enough to do on the client side wether the data 
is current/real-time or 1 day old :)

> I have an quad-core Intel Mac Pro with 1.5 TB and a bunch of RAM we
> can dedicate to this effort, with plenty of bandwidth. And perhaps
> when Stefan's work is published, we could run it as well, since it
> seems to be a great solution to requesting fewer nodes for large
> ways... but for now do you think you could use an XAPI? i think all
> my requests fit into that api.

Currently I am just requesting the full data and then prune it myself 
because I am not actually sure it would help if we do either:

* request partial data (all streets, all landuse), simple because at one 
zoom level high enough you need the full data, anyway
* request ways with less nodes, because that is only good for low zooms 
and I am currently sort of ignoring them :) It is basically a 
side-problem.

> Alternatively, Stefan points out that the dbslayer patch for the
> Cherokee server allows direct JSON requests to a database. So some
> very thin db wrapper might serve us for now? This isn't my area of
> expertise, so if you have better ideas on how to generate JSON direct
> from the db, like GeoServer or something, and still have tag-based
> requests, i'm all ears.

Well, I am not sure that this would be faster or better. If the db-json 
would serve the full API data, we would also get all the "junk" data 
like "note" and so on, and this will overhelm the browser. So it might 
need a filter, too.

Also, my renderer expects the format currently spewed by my proxy. If we 
use stevens format, it wouldn't work (multipolygons are one reason) and 
it would be a lot of work to switch the code.

OTOH; I would not complain if somebody invents a server that spits out 
JSON in the right format and in real-time :)

> Yes, but reducing the polygons is also a lot of work :) I haven't
> > started on this yet, because on zoom 12 or higher you need to
> > render almost anything, anyways. Plus, you would then need to cache
> > thepartial data somehow (computing it is expensive in JS..)
>
> Seems like Stefan's work may address this, no? Or if we did cache it,
> seems like we'd calculate it on the server side.

I was kinda hoping that I build a client-side aplication, not something 
that runs on the server :) If the server has to reduce the polygons, it 
might never be able to process the whole planet.

But I see the point. :)

(I was f.i. pondering if the JSON from the server should already contain 
BBOX data for each way. Decided against it as it: uses bandwitdh and 
server CPU, and isvery fast to compute on the client, anyway. But 
definitely a few things can be precomputed at the server and stored in 
the cache. One example are the multipolygon relationships. The 
presentation in XML isn't actually very useable, so I just rewrite it 
that the client can access it super-fast).

> > > d) oh, and localStorage. I've partially implemented that but
> > > haven't had much testing... other work... ugh. So caching on a
> > > few levels, basically.
> >
> > I fail to see what localstorage actually gains, as the delivered
> > JSON is put into the browser cache, anyway and the rest is cached
> > in memory. Could you maybe explain what your idea was?
>
> Yes, localStorage persists across sessions so you could build up a
> permanent local cache and have more control (in JS) over requesting
> it and timestamping when you cached it, not to mention applying only
> changesets and not complete cache flushes. This has some advantages
> over the browser cache, although that does of course persist across
> sessions too.

But it won't help if you move to a different machine. Also, it goes 
against the "live", we would need to query the server for new data, 
anyway. Currently, if you reload a temap-session, most of the time is 
spent in the rerender, and almost none in loading the data over the 
net.

I guess if I write a 100x faster renderer, that might change, but I'd 
like to work on one problem at a time :)

So for now I'd like to keep localstore out as it creates more problems 
than it solves :)

> > * There is a talk I proposed for State of the Map and I don't want
> > to spoil everything before :)
>
> yes, me too! so if you want to discuss off-list that's fine.

Heh, you have a talk scheduled, too? :) That sounds like fun :)

> Of course, semi-dynamic rules like "color them according to feature X
> by
> > formula Y" are still useful and fun, and avoid the problems above.
> > (Like: "use maxspeed as the color index ranging from red over green
> > to yellow" :).
>
> Yes, this is an exciting area to me, for example the color by
> authorship stylesheet i posted before:
>
> http://map.cartagen.org/find?id=paris&gss=http://unterbahn.com/cartag
>en/authors.gss
>
> or this one i threw together yesterday, based on the tags of measured
> width instead of on a width rule:
>
> http://map.cartagen.org?gss=http://unterbahn.com/cartagen/width.gss
>
> A more fully-rendered screenshot is here:
>
> http://www.flickr.com/photos/jeffreywarren/3510685883/

Yeah, that is what I have in mind, too. But so many things to do, so 
little time :)

> Anyways, thanks for sharing; one thought I had was that besides
> sharing ideas and solutions online, we should try *different*
> approaches, so that we try all the possibilities. I think multiple
> projects working on the same problem can sometimes be redundant, but
> more often it's beneficial for all parties since there's a diversity
> of approaches to a problem. Let's take advantage of that by
> specifically attempting different solutions to the problems we face,
> and discussing the results... if you're willing. If one of us tries a
> technique and it doesn't work, we can all learn from the attempt.

Sure, I am working on my ideas, anyway :) A few things you might find 
interesing:

* no dashed lines on canvas, need to roll your own
* rendering 60000 lines/areas takes a long time (>1minute), which means 
you need a sort of "slippy tiles" setup like I have currently. That 
allows the user to pan the map in real-time and the renderer can only 
render tiles off-screen.

All the best,

Tels

-- 
 Signed on Fri May  8 20:47:12 2009 with key 0x93B84C15.
 Get one of my photo posters: http://bloodgate.com/posters
 PGP key on http://bloodgate.com/tels.asc or per email.

 "If Duke Nukem Forever is not out in 2001, something's very wrong."

  -- George Broussard, 2001 (http://tinyurl.com/6m8nh)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20090508/85aa86c1/attachment.pgp>