[OSM-dev] Planet dump today delayed (ie not today) :-(
Jon Burgess
jburgess777 at googlemail.com
Thu Aug 23 00:50:26 BST 2007
On Wed, 2007-08-22 at 23:45 +0100, Tom Hughes wrote:
> In message <1187821946.5410.234.camel at localhost.localdomain>
> Jon Burgess <jburgess777 at googlemail.com> wrote:
>
> > Looking at the IO graphs I believe the planet.rb script is moving very
> > slowly because DB is completely IO bound. I suspect the 30 minute CPU
> > blips occur each time the planet.rb successfully fetches another page of
> > 500,000 entries.
>
> Interesting. The logical conclusion is that it is related to the
> changes that I made last week - it looks like the increase in InnoDB
> buffer space to 2Gb was done immediately after the planet dump last
> week. That does seem to have helped the API though, so it is swings
> and roundabouts.
I'm not sure it made that much difference. If you compare the rate of
disk change of disk utilisation on dev between this Weds and the same
last week they appear are almost identical. According to the graph the
export started some time shortly after midnight and did not finish until
around midday. It had only been going for a few hours longer when it was
stopped today.
> > I guess it will start moving more rapidly again when the other DB/API
> > users start to drop off after midnight.
>
> I thought Sebastian had stopped it now anyway?
>
There is another planet.rb process running since around 16:30 so it must
have been restarted. This one has been running really slowly though.
Last time I looked it had only collected 1.4GB after running for ~7
hours wall time.
> > I'll try to find some time to see if the planet.rb script or SQL queries
> > can be made more IO efficient.
>
> They're pretty basic queries, so I doubt it.
Something must be not quite right though, surely it shouldn't take over
12 hours to collect the data. As you say, the job that the script is
doing is pretty trivial. I'd be tempted to suggest dumping each of the
relevant tables with mysqldump and then running a script over the SQL to
extract the data (or just importing into a DB on another machine which
can run without any other interruptions -- maybe it is the constant R/W
of the data during the queries which slow it up).
> Osmosis seems to fare rather better though, as it managed to dump the
> database in about 3.5 hours this evening. It also does rather more
> complicated queries, but it does use the history tables (which are
> all MyISAM) rather than the live tables (which are mostly InnoDB).
Yes, Osmosis may well be the better long term answer.
Jon
More information about the dev
mailing list