[OSM-dev] Planet diff's revisited
Andy Allan
gravitystorm at gmail.com
Fri Jul 27 10:01:00 BST 2007
On 7/25/07, spaetz <osm at sspaeth.de> wrote:
> Hi all, I was looking at the planet generation and archiving proces. Currently we archive them both as bz2 and as 7z files. Download stats tell that in 8 days the bz2 has been retrieved nearly 15000 times while the 7z was retrieved about 500 times. Should we continue to bother with 7z, given that disk space on the dev server is not unlimited?
>
> Also I would like to raise the question of planet diff's again. Would people appreciate 4-weekly full dumps and planet diff's in between? As most of the thing remains the same, we could save quite a bit of disk space with that, I guess.
> The catch is IMHO, that the files are too big to be handled with std diff tools, so we (you) would have to use one that can cope with those files (somebody posted them previously, I forgot who).
>
> What do people think?
I've been thinking about this some more.
bz2 is favoured over 7z by a ratio of 30:1. But the 7z files are
around 20% smaller than the bz2 files. Given that installing 7z is
trivial, but still appears to be a massive barrier, we can assume that
download size is almost irrelevant for our consumers.
(Personal experience: download size has implications for both time
taken and disk usage. But both are dwarfed in comparison to osm2pgsql
(time), rendering tiles (both), uploading to my host (time) etc)
So given that the consumers react "inelastically" to download size, we
should make sure the most consumer-friendly downloads are available at
all times. This would appear to be full .bz2 planet dumps, and not
either .7z or diffs.
However, we have internal considerations - the hit on the db of
generating the planet files, and where to store them. (Please note
that bandwidth use from our servers is completely irrelevant, due to
the university hosting). I hold planet generation of high importance
to the project, since it can't be recreated independently (unlike, for
example, cycle layers or t at h). I would seem a ripe target for a few
hundred quid to get a box with a few terabytes of disk space that does
nothing other than compress and serve full planets, or trade for some
other resource off of dev.
If generation of diffs from the db directly is a feasable way of
extracting data more frequently, then it should be done, and the diffs
used to generate full planets. (I'd love to see them daily, but I'm
not sure the db would cope. Can it do stuff-modified-today more easily
than full dumps?)
If there's no way to generate diffs other than having two planets to
start with, then we should still do so, but bear in mind that there
appears to be very little demand for smaller downloads (c.f. 30:1 .bz2
to .7z stats)
Cheers,
Andy
More information about the dev
mailing list