<br><div><span class="gmail_quote">On 7/27/07, <b class="gmail_sendername">Andy Allan</b> <<a href="mailto:gravitystorm@gmail.com">gravitystorm@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 7/25/07, spaetz <<a href="mailto:osm@sspaeth.de">osm@sspaeth.de</a>> wrote:<br>> Hi all, I was looking at the planet generation and archiving proces. Currently we archive them both as bz2 and as 7z files. Download stats tell that in 8 days the bz2 has been retrieved nearly 15000 times while the 7z was retrieved about 500 times. Should we continue to bother with 7z, given that disk space on the dev server is not unlimited?
<br>><br>> Also I would like to raise the question of planet diff's again. Would people appreciate 4-weekly full dumps and planet diff's in between? As most of the thing remains the same, we could save quite a bit of disk space with that, I guess.
<br>> The catch is IMHO, that the files are too big to be handled with std diff tools, so we (you) would have to use one that can cope with those files (somebody posted them previously, I forgot who).<br>><br>> What do people think?
<br><br>I've been thinking about this some more.<br><br>bz2 is favoured over 7z by a ratio of 30:1. But the 7z files are<br>around 20% smaller than the bz2 files. Given that installing 7z is<br>trivial, but still appears to be a massive barrier, we can assume that
<br>download size is almost irrelevant for our consumers.<br><br>(Personal experience: download size has implications for both time<br>taken and disk usage. But both are dwarfed in comparison to osm2pgsql<br>(time), rendering tiles (both), uploading to my host (time) etc)
<br><br>So given that the consumers react "inelastically" to download size, we<br>should make sure the most consumer-friendly downloads are available at<br>all times. This would appear to be full .bz2 planet dumps, and not
<br>either .7z or diffs.<br><br>However, we have internal considerations - the hit on the db of<br>generating the planet files, and where to store them. (Please note<br>that bandwidth use from our servers is completely irrelevant, due to
<br>the university hosting). I hold planet generation of high importance<br>to the project, since it can't be recreated independently (unlike, for<br>example, cycle layers or t@h). I would seem a ripe target for a few
<br>hundred quid to get a box with a few terabytes of disk space that does<br>nothing other than compress and serve full planets, or trade for some<br>other resource off of dev.</blockquote><div><br>This machine (<a href="http://munin.openstreetmap.org/openstreetmap/db2.openstreetmap.html">
http://munin.openstreetmap.org/openstreetmap/db2.openstreetmap.html</a>) can be used for this purpose. It's just sitting there helping to heat the planet as far as I can see.<br><br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
If generation of diffs from the db directly is a feasable way of<br>extracting data more frequently, then it should be done, and the diffs<br>used to generate full planets. (I'd love to see them daily, but I'm<br>
not sure the db would cope. Can it do stuff-modified-today more easily<br>than full dumps?)<br><br>If there's no way to generate diffs other than having two planets to<br>start with, then we should still do so, but bear in mind that there
<br>appears to be very little demand for smaller downloads (c.f. 30:1 .bz2<br>to .7z stats)<br><br>Cheers,<br>Andy<br><br>_______________________________________________<br>dev mailing list<br><a href="mailto:dev@openstreetmap.org">
dev@openstreetmap.org</a><br><a href="http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev">http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev</a><br></blockquote></div><br>