Great, this is a good discussion. I've put up a wiki page with some of the things we've covered, with pros/cons. I hope we can continue to talk about our approaches and as we optimize for different problems post some of it back up here:<div>

<br></div><div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><a href="http://code.google.com/p/cartagen/wiki/FeatureTradeoff">http://code.google.com/p/cartagen/wiki/FeatureTradeoff</a></span></font></div>

<div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><br>

</span></font></div><div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><span class="Apple-style-span" style="font-family: arial; font-size: 13px; white-space: normal; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; ">I put in what I could gather about Temap, but feel free to update and add more pros and cons... this is just my thought process so far. We might also add a "status" column so we can annotate what we learn from each approach.</span></span></font></div>

<div><br></div><div>Best,</div><div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><font class="Apple-style-span" face="arial" size="3"><span class="Apple-style-span" style="font-size: 13px; white-space: normal; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px;">Jeff<br>

</span></font></span></font><br><div class="gmail_quote">On Fri, May 8, 2009 at 3:00 PM, Tels <span dir="ltr"><<a href="mailto:nospam-abuse@bloodgate.com">nospam-abuse@bloodgate.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Moin,<br>

<div class="im"><br>

On Friday 08 May 2009 20:04:48 you wrote:<br>

> > * The proxy receives XML from the api or xapi server. Currently it<br>

> > requests the full dataset.<br>

> > * Then it removes unnec. tags (like note, fixme, attribution and a<br>

> > whole bunch of others that are not needed for rendering). Some of<br>

> > them are very minor, but 10000 nodes with "attribution=veryvery<br>

> > long string here" can make up like 40% of all the data, and just<br>

> > clog the line and browser :)<br>

><br>

> Yes, I'm thinking of trying to cache locally but still request<br>

> changesets if the ?live=true tag is set... caching locally is great<br>

> for more static data but for the live viewer, I'm trying to not use<br>

> caching, but increase efficiency in the requests.<br>

<br>

</div>I fear loading data live from the API server is just not feasible,<br>

unless you:<br>

<br>

* only load diffs (minute-diffs?) and update your already cached at the<br>

proxy data with that. OTOH I read that importing a one-hour diff into a<br>

postgres database can take 40..70 minutes, e.g. depending on load you<br>

might not even manage to update your DB with the diffs fast enough...<br>

* invent an API server that is about 1000 times faster :)<br>

* do never zoom out from level 18, anything below will request so much<br>

data that you can't get it live :)<br>

<br>

Currently I consider "live-view" not an achiveable goal, I am happy if I<br>

can render data that is about 1 day or so old.<br>

<div class="im"><br>

> * The data is then pruned into (currently 3) levels and stored in a<br>

><br>

> > cache:<br>

> >  * level 0 - full<br>

> >  * level 1 - no POI, no paths, streams, tracks etc. used for zoom<br>

> > 11 * level 2 - no tertiary roads etc. used for zoom 10 and below *<br>

> > The client is served the level it currently requested as JSON.gz.<br>

><br>

> Great, this is what I'm working on too. I'm thinking a ruleset about<br>

> what features are relevant for what zoom levels could be something to<br>

> work together on? I was also thinking of correlating tags with a<br>

> certain zoom level. But maybe each tag should be associated with a<br>

> range of zoom levels, like "way: { zoom_outer: 3, zoom_inner: 1 }".<br>

> Thoughts?<br>

<br>

</div>My rules do have a minimum zoom level, smaller than that and they are<br>

not rendered. The levels are inspired by the osmarenderer and mapnik<br>

outputs, but I moved a few of them down so you can render really high<br>

resolution maps.<br>

<br>

However, the pruning at the proxy is something else and not connected to<br>

that. For instance, somebody might not want to see tertiary roads on<br>

level 13, but others want. So I make sure that I only prune out data<br>

that is never be able too seen on that level. E.g. a conservative<br>

pruning.<br>

<br>

Also, about 90% of the data-pruning is about removing unwanted data<br>

(like "note=blah" :) and not about the smaller zoom levels because<br>

currently it is simple not feasible to render below 10 and even for<br>

zoom 10 you need a really really beefy machine and a long wait time....<br>

<div class="im"><br>

> > * There are three servers in the list (api.openstreetmap,<br>

> > xapi.informationfreeway and tagwatch) and a lot of them do not<br>

> > complete the request (internal error, not implemented etc. etc.).<br>

> > It can take a lot of retries to finally get the data.<br>

> > * Even when you get the data, it takes seconds (10..40 seconds<br>

> > is "normal") to minutes - upwards to 360 seconds just to serve one<br>

> > request.<br>

> ><br>

> > So currently all received data is stored in the cache for 7 days to<br>

> > avoid the very very long loading times.<br>

> ><br>

> > Ideas of fetching the full dataset and pre-computing the cache<br>

> > simple don't work because I don't have a big enough machine and no<br>

> > big enough online account to store the resulting JSON :(<br>

> ><br>

> ><br>

> ><br>

> > Also, somehow processing 150 Gbyte XML into JSON will prove to be a<br>

> > challange :)<br>

><br>

> So I'm having the same problems with the APIs. The standard 0.6 api<br>

> has been pretty good but of course it serves XML, not JSON. The xapi<br>

> is not very responsive to me, it seems.<br>

<br>

</div>Neither for me, but the API server is very slow, too. It seems it can't<br>

manage to send me more than 17Kbyte/s (but maybe it is bandwidth<br>

limited?).<br>

<div class="im"><br>

> I thought parsing XML in JS<br>

> would be molasses,<br>

<br>

</div>When I tried it, it used ungodly amounts of memory (because the data<br>

structure is not usefull for rendering and it contains so much cruft),<br>

and I also never managed to extract the actual node data for rendering<br>

from it...<br>

<div class="im"><br>

> so if you're interested, we should put up our own<br>

> XAPI or custom api off the planet.osm file, and send JSON?<br>

<br>

</div>Yeah, that was my plan for the near future :) For now I am happy with my<br>

proxy as there is quite enough to do on the client side wether the data<br>

is current/real-time or 1 day old :)<br>

<div class="im"><br>

> I have an quad-core Intel Mac Pro with 1.5 TB and a bunch of RAM we<br>

> can dedicate to this effort, with plenty of bandwidth. And perhaps<br>

> when Stefan's work is published, we could run it as well, since it<br>

> seems to be a great solution to requesting fewer nodes for large<br>

> ways... but for now do you think you could use an XAPI? i think all<br>

> my requests fit into that api.<br>

<br>

</div>Currently I am just requesting the full data and then prune it myself<br>

because I am not actually sure it would help if we do either:<br>

<br>

* request partial data (all streets, all landuse), simple because at one<br>

zoom level high enough you need the full data, anyway<br>

* request ways with less nodes, because that is only good for low zooms<br>

and I am currently sort of ignoring them :) It is basically a<br>

side-problem.<br>

<div class="im"><br>

> Alternatively, Stefan points out that the dbslayer patch for the<br>

> Cherokee server allows direct JSON requests to a database. So some<br>

> very thin db wrapper might serve us for now? This isn't my area of<br>

> expertise, so if you have better ideas on how to generate JSON direct<br>

> from the db, like GeoServer or something, and still have tag-based<br>

> requests, i'm all ears.<br>

<br>

</div>Well, I am not sure that this would be faster or better. If the db-json<br>

would serve the full API data, we would also get all the "junk" data<br>

like "note" and so on, and this will overhelm the browser. So it might<br>

need a filter, too.<br>

<br>

Also, my renderer expects the format currently spewed by my proxy. If we<br>

use stevens format, it wouldn't work (multipolygons are one reason) and<br>

it would be a lot of work to switch the code.<br>

<br>

OTOH; I would not complain if somebody invents a server that spits out<br>

JSON in the right format and in real-time :)<br>

<div class="im"><br>

> Yes, but reducing the polygons is also a lot of work :) I haven't<br>

> > started on this yet, because on zoom 12 or higher you need to<br>

> > render almost anything, anyways. Plus, you would then need to cache<br>

> > thepartial data somehow (computing it is expensive in JS..)<br>

><br>

> Seems like Stefan's work may address this, no? Or if we did cache it,<br>

> seems like we'd calculate it on the server side.<br>

<br>

</div>I was kinda hoping that I build a client-side aplication, not something<br>

that runs on the server :) If the server has to reduce the polygons, it<br>

might never be able to process the whole planet.<br>

<br>

But I see the point. :)<br>

<br>

(I was f.i. pondering if the JSON from the server should already contain<br>

BBOX data for each way. Decided against it as it: uses bandwitdh and<br>

server CPU, and isvery fast to compute on the client, anyway. But<br>

definitely a few things can be precomputed at the server and stored in<br>

the cache. One example are the multipolygon relationships. The<br>

presentation in XML isn't actually very useable, so I just rewrite it<br>

that the client can access it super-fast).<br>

<div class="im"><br>

> > > d) oh, and localStorage. I've partially implemented that but<br>

> > > haven't had much testing... other work... ugh. So caching on a<br>

> > > few levels, basically.<br>

> ><br>

> > I fail to see what localstorage actually gains, as the delivered<br>

> > JSON is put into the browser cache, anyway and the rest is cached<br>

> > in memory. Could you maybe explain what your idea was?<br>

><br>

> Yes, localStorage persists across sessions so you could build up a<br>

> permanent local cache and have more control (in JS) over requesting<br>

> it and timestamping when you cached it, not to mention applying only<br>

> changesets and not complete cache flushes. This has some advantages<br>

> over the browser cache, although that does of course persist across<br>

> sessions too.<br>

<br>

</div>But it won't help if you move to a different machine. Also, it goes<br>

against the "live", we would need to query the server for new data,<br>

anyway. Currently, if you reload a temap-session, most of the time is<br>

spent in the rerender, and almost none in loading the data over the<br>

net.<br>

<br>

I guess if I write a 100x faster renderer, that might change, but I'd<br>

like to work on one problem at a time :)<br>

<br>

So for now I'd like to keep localstore out as it creates more problems<br>

than it solves :)<br>

<div class="im"><br>

> > * There is a talk I proposed for State of the Map and I don't want<br>

> > to spoil everything before :)<br>

><br>

> yes, me too! so if you want to discuss off-list that's fine.<br>

<br>

</div>Heh, you have a talk scheduled, too? :) That sounds like fun :)<br>

<div class="im"><br>

> Of course, semi-dynamic rules like "color them according to feature X<br>

> by<br>

> > formula Y" are still useful and fun, and avoid the problems above.<br>

> > (Like: "use maxspeed as the color index ranging from red over green<br>

> > to yellow" :).<br>

><br>

> Yes, this is an exciting area to me, for example the color by<br>

> authorship stylesheet i posted before:<br>

><br>

> <a href="http://map.cartagen.org/find?id=paris&gss=http://unterbahn.com/cartag" target="_blank">http://map.cartagen.org/find?id=paris&gss=http://unterbahn.com/cartag</a><br>

>en/authors.gss<br>

><br>

> or this one i threw together yesterday, based on the tags of measured<br>

> width instead of on a width rule:<br>

><br>

> <a href="http://map.cartagen.org?gss=http://unterbahn.com/cartagen/width.gss" target="_blank">http://map.cartagen.org?gss=http://unterbahn.com/cartagen/width.gss</a><br>

><br>

> A more fully-rendered screenshot is here:<br>

><br>

> <a href="http://www.flickr.com/photos/jeffreywarren/3510685883/" target="_blank">http://www.flickr.com/photos/jeffreywarren/3510685883/</a><br>

<br>

</div>Yeah, that is what I have in mind, too. But so many things to do, so<br>

little time :)<br>

<div class="im"><br>

> Anyways, thanks for sharing; one thought I had was that besides<br>

> sharing ideas and solutions online, we should try *different*<br>

> approaches, so that we try all the possibilities. I think multiple<br>

> projects working on the same problem can sometimes be redundant, but<br>

> more often it's beneficial for all parties since there's a diversity<br>

> of approaches to a problem. Let's take advantage of that by<br>

> specifically attempting different solutions to the problems we face,<br>

> and discussing the results... if you're willing. If one of us tries a<br>

> technique and it doesn't work, we can all learn from the attempt.<br>

<br>

</div>Sure, I am working on my ideas, anyway :) A few things you might find<br>

interesing:<br>

<br>

* no dashed lines on canvas, need to roll your own<br>

* rendering 60000 lines/areas takes a long time (>1minute), which means<br>

you need a sort of "slippy tiles" setup like I have currently. That<br>

allows the user to pan the map in real-time and the renderer can only<br>

render tiles off-screen.<br>

<br>

All the best,<br>

<br>

Tels<br>

<br>

--<br>

 Signed on Fri May  8 20:47:12 2009 with key 0x93B84C15.<br>

<div class="im"> Get one of my photo posters: <a href="http://bloodgate.com/posters" target="_blank">http://bloodgate.com/posters</a><br>

 PGP key on <a href="http://bloodgate.com/tels.asc" target="_blank">http://bloodgate.com/tels.asc</a> or per email.<br>

<br>

</div> "If Duke Nukem Forever is not out in 2001, something's very wrong."<br>

<div><div></div><div class="h5"><br>

  -- George Broussard, 2001 (<a href="http://tinyurl.com/6m8nh" target="_blank">http://tinyurl.com/6m8nh</a>)<br>

</div></div></blockquote></div><br></div>