[OSM-dev] New, faster, planet dump tool

Tue Sep 25 12:30:55 BST 2007

On 25/09/2007, Brett Henderson <brett at bretth.com> wrote:
> spaetz wrote:
> > On Mon, Sep 24, 2007 at 11:35:38PM +0100, Jon Burgess wrote:
> >
> >> I've just added a C implementation of the planet.rb script into SVN. The
> >> new code is approximately 10 times faster in my tests:
> >>
> >
> > Sounds nice. So far my plan was to switching to osmosis for the planet dumps which is significantly faster than the ruby planet dump as well. I don't know whether you guys want to coordinate on whether the C implementation is worth the effort, rather than focusing on osmosis for now.
> >
> > I'll use whatever is faster and reliably produces a planet output.
> >
> > spaetz
> >
> If planet.c is truly quicker then it makes sense to use it.  osmosis is
> including user information in the dump which I thought would be useful
> but that should be easy to add to planet.c as well.
>
> I've done my best to make osmosis as flexible as possible for a wide
> range of usage scenarios, I was hoping it could become a universal tool
> for shifting OSM data around.  While there's no harm in having competing
> tools for the same job, it does raise the issue of maintaining all of
> these tools when the schema changes however ...
>
> At the end of the day I don't really mind which tool is used.  osmosis
> is aiming to eliminate planet dumping altogether, how we get there
> doesn't matter.

I wrote this version because it looked like the runtime of planet.rb
was becoming an issue again. The code should be faster than Osmosis
but I have not run a specific test with this current data set. It
should be the bzip2 compression time which will dominate the planet
dump time.

planet.c and osmosis do have slightly different aims. The osmosis code
is a far more generic implementation but I suspect its speed will
never be able to match a custom C implementation. On the other hand,
generating diffs directly from the DB using Osmosis would be far
quicker then using the current planet dump + planetdiff tools.

I'd be happy to use either tool provided they both achieve the same
result within time and memory constraints.

Anoner possibility is to use the planet.c code to stream a DB dump
into the PostgreSQL mapnik database. Avoiding the bzip2 compression
should allow this to be done quite rapidly. We could then update the
Mapnik layer more frequently than the formal weekly planet dump.

  Jon