[OSM-dev] Planet diff's revisited

80n 80n80n at gmail.com
Fri Jul 27 11:59:27 BST 2007


On 7/27/07, Andy Allan <gravitystorm at gmail.com> wrote:
>
> On 7/25/07, spaetz <osm at sspaeth.de> wrote:
> > Hi all, I was looking at the planet generation and archiving proces.
> Currently we archive them both as bz2 and as 7z files. Download stats tell
> that in 8 days the bz2 has been retrieved nearly 15000 times while the 7z
> was retrieved about 500 times. Should we continue to bother with 7z, given
> that disk space on the dev server is not unlimited?
> >
> > Also I would like to raise the question of planet diff's again. Would
> people appreciate 4-weekly full dumps and planet diff's in between? As most
> of the thing remains the same, we could save quite a bit of disk space with
> that, I guess.
> > The catch is IMHO, that the files are too big to be handled with std
> diff tools, so we (you) would have to use one that can cope with those files
> (somebody posted them previously, I forgot who).
> >
> > What do people think?
>
> I've been thinking about this some more.
>
> bz2 is favoured over 7z by a ratio of 30:1. But the 7z files are
> around 20% smaller than the bz2 files. Given that installing 7z is
> trivial, but still appears to be a massive barrier, we can assume that
> download size is almost irrelevant for our consumers.
>
> (Personal experience: download size has implications for both time
> taken and disk usage. But both are dwarfed in comparison to osm2pgsql
> (time), rendering tiles (both), uploading to my host (time) etc)
>
> So given that the consumers react "inelastically" to download size, we
> should make sure the most consumer-friendly downloads are available at
> all times. This would appear to be full .bz2 planet dumps, and not
> either .7z or diffs.
>
> However, we have internal considerations - the hit on the db of
> generating the planet files, and where to store them. (Please note
> that bandwidth use from our servers is completely irrelevant, due to
> the university hosting). I hold planet generation of high importance
> to the project, since it can't be recreated independently (unlike, for
> example, cycle layers or t at h). I would seem a ripe target for a few
> hundred quid to get a box with a few terabytes of disk space that does
> nothing other than compress and serve full planets, or trade for some
> other resource off of dev.


This machine (
http://munin.openstreetmap.org/openstreetmap/db2.openstreetmap.html) can be
used for this purpose.  It's just sitting there helping to heat the planet
as far as I can see.


If generation of diffs from the db directly is a feasable way of
> extracting data more frequently, then it should be done, and the diffs
> used to generate full planets. (I'd love to see them daily, but I'm
> not sure the db would cope. Can it do stuff-modified-today more easily
> than full dumps?)
>
> If there's no way to generate diffs other than having two planets to
> start with, then we should still do so, but bear in mind that there
> appears to be very little demand for smaller downloads (c.f. 30:1 .bz2
> to .7z stats)
>
> Cheers,
> Andy
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070727/3ef68e11/attachment.html>


More information about the dev mailing list