Great, this is a good discussion. I've put up a wiki page with some of the things we've covered, with pros/cons. I hope we can continue to talk about our approaches and as we optimize for different problems post some of it back up here:<div>
<br></div><div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><a href="http://code.google.com/p/cartagen/wiki/FeatureTradeoff">http://code.google.com/p/cartagen/wiki/FeatureTradeoff</a></span></font></div>
<div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><br>
</span></font></div><div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><span class="Apple-style-span" style="font-family: arial; font-size: 13px; white-space: normal; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; ">I put in what I could gather about Temap, but feel free to update and add more pros and cons... this is just my thought process so far. We might also add a "status" column so we can annotate what we learn from each approach.</span></span></font></div>
<div><br></div><div>Best,</div><div><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px; white-space: pre-wrap; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"><font class="Apple-style-span" face="arial" size="3"><span class="Apple-style-span" style="font-size: 13px; white-space: normal; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px;">Jeff<br>
</span></font></span></font><br><div class="gmail_quote">On Fri, May 8, 2009 at 3:00 PM, Tels <span dir="ltr"><<a href="mailto:nospam-abuse@bloodgate.com">nospam-abuse@bloodgate.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Moin,<br>
<div class="im"><br>
On Friday 08 May 2009 20:04:48 you wrote:<br>
> > * The proxy receives XML from the api or xapi server. Currently it<br>
> > requests the full dataset.<br>
> > * Then it removes unnec. tags (like note, fixme, attribution and a<br>
> > whole bunch of others that are not needed for rendering). Some of<br>
> > them are very minor, but 10000 nodes with "attribution=veryvery<br>
> > long string here" can make up like 40% of all the data, and just<br>
> > clog the line and browser :)<br>
><br>
> Yes, I'm thinking of trying to cache locally but still request<br>
> changesets if the ?live=true tag is set... caching locally is great<br>
> for more static data but for the live viewer, I'm trying to not use<br>
> caching, but increase efficiency in the requests.<br>
<br>
</div>I fear loading data live from the API server is just not feasible,<br>
unless you:<br>
<br>
* only load diffs (minute-diffs?) and update your already cached at the<br>
proxy data with that. OTOH I read that importing a one-hour diff into a<br>
postgres database can take 40..70 minutes, e.g. depending on load you<br>
might not even manage to update your DB with the diffs fast enough...<br>
* invent an API server that is about 1000 times faster :)<br>
* do never zoom out from level 18, anything below will request so much<br>
data that you can't get it live :)<br>
<br>
Currently I consider "live-view" not an achiveable goal, I am happy if I<br>
can render data that is about 1 day or so old.<br>
<div class="im"><br>
> * The data is then pruned into (currently 3) levels and stored in a<br>
><br>
> > cache:<br>
> > * level 0 - full<br>
> > * level 1 - no POI, no paths, streams, tracks etc. used for zoom<br>
> > 11 * level 2 - no tertiary roads etc. used for zoom 10 and below *<br>
> > The client is served the level it currently requested as JSON.gz.<br>
><br>
> Great, this is what I'm working on too. I'm thinking a ruleset about<br>
> what features are relevant for what zoom levels could be something to<br>
> work together on? I was also thinking of correlating tags with a<br>
> certain zoom level. But maybe each tag should be associated with a<br>
> range of zoom levels, like "way: { zoom_outer: 3, zoom_inner: 1 }".<br>
> Thoughts?<br>
<br>
</div>My rules do have a minimum zoom level, smaller than that and they are<br>
not rendered. The levels are inspired by the osmarenderer and mapnik<br>
outputs, but I moved a few of them down so you can render really high<br>
resolution maps.<br>
<br>
However, the pruning at the proxy is something else and not connected to<br>
that. For instance, somebody might not want to see tertiary roads on<br>
level 13, but others want. So I make sure that I only prune out data<br>
that is never be able too seen on that level. E.g. a conservative<br>
pruning.<br>
<br>
Also, about 90% of the data-pruning is about removing unwanted data<br>
(like "note=blah" :) and not about the smaller zoom levels because<br>
currently it is simple not feasible to render below 10 and even for<br>
zoom 10 you need a really really beefy machine and a long wait time....<br>
<div class="im"><br>
> > * There are three servers in the list (api.openstreetmap,<br>
> > xapi.informationfreeway and tagwatch) and a lot of them do not<br>
> > complete the request (internal error, not implemented etc. etc.).<br>
> > It can take a lot of retries to finally get the data.<br>
> > * Even when you get the data, it takes seconds (10..40 seconds<br>
> > is "normal") to minutes - upwards to 360 seconds just to serve one<br>
> > request.<br>
> ><br>
> > So currently all received data is stored in the cache for 7 days to<br>
> > avoid the very very long loading times.<br>
> ><br>
> > Ideas of fetching the full dataset and pre-computing the cache<br>
> > simple don't work because I don't have a big enough machine and no<br>
> > big enough online account to store the resulting JSON :(<br>
> ><br>
> ><br>
> ><br>
> > Also, somehow processing 150 Gbyte XML into JSON will prove to be a<br>
> > challange :)<br>
><br>
> So I'm having the same problems with the APIs. The standard 0.6 api<br>
> has been pretty good but of course it serves XML, not JSON. The xapi<br>
> is not very responsive to me, it seems.<br>
<br>
</div>Neither for me, but the API server is very slow, too. It seems it can't<br>
manage to send me more than 17Kbyte/s (but maybe it is bandwidth<br>
limited?).<br>
<div class="im"><br>
> I thought parsing XML in JS<br>
> would be molasses,<br>
<br>
</div>When I tried it, it used ungodly amounts of memory (because the data<br>
structure is not usefull for rendering and it contains so much cruft),<br>
and I also never managed to extract the actual node data for rendering<br>
from it...<br>
<div class="im"><br>
> so if you're interested, we should put up our own<br>
> XAPI or custom api off the planet.osm file, and send JSON?<br>
<br>
</div>Yeah, that was my plan for the near future :) For now I am happy with my<br>
proxy as there is quite enough to do on the client side wether the data<br>
is current/real-time or 1 day old :)<br>
<div class="im"><br>
> I have an quad-core Intel Mac Pro with 1.5 TB and a bunch of RAM we<br>
> can dedicate to this effort, with plenty of bandwidth. And perhaps<br>
> when Stefan's work is published, we could run it as well, since it<br>
> seems to be a great solution to requesting fewer nodes for large<br>
> ways... but for now do you think you could use an XAPI? i think all<br>
> my requests fit into that api.<br>
<br>
</div>Currently I am just requesting the full data and then prune it myself<br>
because I am not actually sure it would help if we do either:<br>
<br>
* request partial data (all streets, all landuse), simple because at one<br>
zoom level high enough you need the full data, anyway<br>
* request ways with less nodes, because that is only good for low zooms<br>
and I am currently sort of ignoring them :) It is basically a<br>
side-problem.<br>
<div class="im"><br>
> Alternatively, Stefan points out that the dbslayer patch for the<br>
> Cherokee server allows direct JSON requests to a database. So some<br>
> very thin db wrapper might serve us for now? This isn't my area of<br>
> expertise, so if you have better ideas on how to generate JSON direct<br>
> from the db, like GeoServer or something, and still have tag-based<br>
> requests, i'm all ears.<br>
<br>
</div>Well, I am not sure that this would be faster or better. If the db-json<br>
would serve the full API data, we would also get all the "junk" data<br>
like "note" and so on, and this will overhelm the browser. So it might<br>
need a filter, too.<br>
<br>
Also, my renderer expects the format currently spewed by my proxy. If we<br>
use stevens format, it wouldn't work (multipolygons are one reason) and<br>
it would be a lot of work to switch the code.<br>
<br>
OTOH; I would not complain if somebody invents a server that spits out<br>
JSON in the right format and in real-time :)<br>
<div class="im"><br>
> Yes, but reducing the polygons is also a lot of work :) I haven't<br>
> > started on this yet, because on zoom 12 or higher you need to<br>
> > render almost anything, anyways. Plus, you would then need to cache<br>
> > thepartial data somehow (computing it is expensive in JS..)<br>
><br>
> Seems like Stefan's work may address this, no? Or if we did cache it,<br>
> seems like we'd calculate it on the server side.<br>
<br>
</div>I was kinda hoping that I build a client-side aplication, not something<br>
that runs on the server :) If the server has to reduce the polygons, it<br>
might never be able to process the whole planet.<br>
<br>
But I see the point. :)<br>
<br>
(I was f.i. pondering if the JSON from the server should already contain<br>
BBOX data for each way. Decided against it as it: uses bandwitdh and<br>
server CPU, and isvery fast to compute on the client, anyway. But<br>
definitely a few things can be precomputed at the server and stored in<br>
the cache. One example are the multipolygon relationships. The<br>
presentation in XML isn't actually very useable, so I just rewrite it<br>
that the client can access it super-fast).<br>
<div class="im"><br>
> > > d) oh, and localStorage. I've partially implemented that but<br>
> > > haven't had much testing... other work... ugh. So caching on a<br>
> > > few levels, basically.<br>
> ><br>
> > I fail to see what localstorage actually gains, as the delivered<br>
> > JSON is put into the browser cache, anyway and the rest is cached<br>
> > in memory. Could you maybe explain what your idea was?<br>
><br>
> Yes, localStorage persists across sessions so you could build up a<br>
> permanent local cache and have more control (in JS) over requesting<br>
> it and timestamping when you cached it, not to mention applying only<br>
> changesets and not complete cache flushes. This has some advantages<br>
> over the browser cache, although that does of course persist across<br>
> sessions too.<br>
<br>
</div>But it won't help if you move to a different machine. Also, it goes<br>
against the "live", we would need to query the server for new data,<br>
anyway. Currently, if you reload a temap-session, most of the time is<br>
spent in the rerender, and almost none in loading the data over the<br>
net.<br>
<br>
I guess if I write a 100x faster renderer, that might change, but I'd<br>
like to work on one problem at a time :)<br>
<br>
So for now I'd like to keep localstore out as it creates more problems<br>
than it solves :)<br>
<div class="im"><br>
> > * There is a talk I proposed for State of the Map and I don't want<br>
> > to spoil everything before :)<br>
><br>
> yes, me too! so if you want to discuss off-list that's fine.<br>
<br>
</div>Heh, you have a talk scheduled, too? :) That sounds like fun :)<br>
<div class="im"><br>
> Of course, semi-dynamic rules like "color them according to feature X<br>
> by<br>
> > formula Y" are still useful and fun, and avoid the problems above.<br>
> > (Like: "use maxspeed as the color index ranging from red over green<br>
> > to yellow" :).<br>
><br>
> Yes, this is an exciting area to me, for example the color by<br>
> authorship stylesheet i posted before:<br>
><br>
> <a href="http://map.cartagen.org/find?id=paris&gss=http://unterbahn.com/cartag" target="_blank">http://map.cartagen.org/find?id=paris&gss=http://unterbahn.com/cartag</a><br>
>en/authors.gss<br>
><br>
> or this one i threw together yesterday, based on the tags of measured<br>
> width instead of on a width rule:<br>
><br>
> <a href="http://map.cartagen.org?gss=http://unterbahn.com/cartagen/width.gss" target="_blank">http://map.cartagen.org?gss=http://unterbahn.com/cartagen/width.gss</a><br>
><br>
> A more fully-rendered screenshot is here:<br>
><br>
> <a href="http://www.flickr.com/photos/jeffreywarren/3510685883/" target="_blank">http://www.flickr.com/photos/jeffreywarren/3510685883/</a><br>
<br>
</div>Yeah, that is what I have in mind, too. But so many things to do, so<br>
little time :)<br>
<div class="im"><br>
> Anyways, thanks for sharing; one thought I had was that besides<br>
> sharing ideas and solutions online, we should try *different*<br>
> approaches, so that we try all the possibilities. I think multiple<br>
> projects working on the same problem can sometimes be redundant, but<br>
> more often it's beneficial for all parties since there's a diversity<br>
> of approaches to a problem. Let's take advantage of that by<br>
> specifically attempting different solutions to the problems we face,<br>
> and discussing the results... if you're willing. If one of us tries a<br>
> technique and it doesn't work, we can all learn from the attempt.<br>
<br>
</div>Sure, I am working on my ideas, anyway :) A few things you might find<br>
interesing:<br>
<br>
* no dashed lines on canvas, need to roll your own<br>
* rendering 60000 lines/areas takes a long time (>1minute), which means<br>
you need a sort of "slippy tiles" setup like I have currently. That<br>
allows the user to pan the map in real-time and the renderer can only<br>
render tiles off-screen.<br>
<br>
All the best,<br>
<br>
Tels<br>
<br>
--<br>
Signed on Fri May 8 20:47:12 2009 with key 0x93B84C15.<br>
<div class="im"> Get one of my photo posters: <a href="http://bloodgate.com/posters" target="_blank">http://bloodgate.com/posters</a><br>
PGP key on <a href="http://bloodgate.com/tels.asc" target="_blank">http://bloodgate.com/tels.asc</a> or per email.<br>
<br>
</div> "If Duke Nukem Forever is not out in 2001, something's very wrong."<br>
<div><div></div><div class="h5"><br>
-- George Broussard, 2001 (<a href="http://tinyurl.com/6m8nh" target="_blank">http://tinyurl.com/6m8nh</a>)<br>
</div></div></blockquote></div><br></div>