[Geocoding] Import run for Europe
Sarah Hoffmann
lonvia at denofr.de
Mon Apr 29 22:34:16 UTC 2013
On Mon, Apr 29, 2013 at 10:20:55PM +0100, Simon Nuttall wrote:
> On 29 April 2013 08:30, Sarah Hoffmann <lonvia at denofr.de> wrote:
> > Hi Simon,
> >
> > On Mon, Apr 29, 2013 at 03:39:37AM +0100, Simon Nuttall wrote:
> >> Using our installation script mentioned at:
> >> http://lists.openstreetmap.org/pipermail/geocoding/2013-April/000779.html
> >>
> >> I started a Europe wide import on a 4 CPU processor machine with 16GB RAM.
> >>
> >> That was 13 days ago, on 16 April, and by 21 April the log file said:
> >>
> >> GRANT SELECT ON place_classtype_natural_islet TO "www-data";drop index
> >> idx_placex_classtype;# Done special phrases Sun Apr 21 20:29:08 BST
> >> 2013
> >> # Nominatim website created Sun Apr 21 20:29:09 BST 2013
> >
> > It looks like your initial import has finished. After this line
> > your script seems to start the update process directly, about here:
> > https://github.com/cyclestreets/nominatim-install/blob/master/run.sh#L237
> >
> > This line actually runs the update process in an endless loop, so your
> > script will never finish and is happily keeping the database up to date
> > now. But the database should already be ready to be used.
>
> Yeah I did try using that for a while:
>
> http://nominatim.cyclestreets.net/
>
> but because the script was still running and was I/O bound the
> performance was slow. So I decided to stop and wait until the script
> finished.
It's mostly osm2pgsql that causes responsiveless to decrese. It helps a
lot to keep the updates small. Basically, I wouldn't recommend to import
updates of more than an hour at a time, even when the machine is offline and
just catching up. The osm instance is currently set to half hour chunks
which seems to work well even when it is under heavy load.
You seem to have set your system to arbitrarily large chunks. Not sure
how that happend. You'll find the setting in settings/configuration.txt
under 'maxInterval'.
> >> Processing: Node(307016k 2.2k/s) Way(39382k 0.09k/s) Relation(118300 14.95/s)
> >
> > Now that is odd. Normally, an update step just imports a few k of data
> > at most. Here you seem to reimport almost the entire planet again. It looks
> > like the state file was not set up correctly and you have been applying
> > updates of the last half year.
> >
> > It's hard to say how long the update step above will continue. It is probably
> > almost done. So you might just let it run for a bit more but be aware that
> > the osm2pgsql processing you see above is followed by indexing which will
> > again take quite a bit of time.
> >
> > It might work to just cancel the process and let the DB roll back. If you
> > do that, check afterwards if all rows in the placex table that have
> > indexed_status = 0. If that is the case, you should be savely back to the
> > state of your initial import. Then you can try to figure out what the
> > problem with the state file is (settings/state.txt), get the correct one
> > and restart your updates.
>
> I've let it carry on running, and you're right it seems like its doing
> the whole import again. Currently I'm seeing lines like:
>
> Done 1118475 in 44519 @ 25.123543 per second - Rank 26 ETA
> (seconds): 152234.937500
> Done 1118482 in 44519 @ 25.123699 per second - Rank 26 ETA
> (seconds): 152233.718750
> Done 1118489 in 44520 @ 25.123293 per second - Rank 26 ETA
> (seconds): 152235.890625
>
> The settings/state.txt file reads:
>
> #Mon Apr 22 07:40:54 BST 2013
> sequenceNumber=247388
> timestamp=2013-03-03T20\:59\:02Z
As far as I know, osmosis should have updated that by now which would mean
that you have imported around 4 months worth of updates and are still only
in March. Hard to say what went wrong without knowing what exactly was in
the state file before the update started. You'll probably be faster with a
reimport at this point.
> > NB: geofabrik has now dayly diffs for excerpts which will work fine with
> > Nominatim. Maybe that is of interest for you. You need to get the latest
> > Nominatim version from github, though, and you probably need to reimport
> > because the DB schema has changed a little bit.
>
> A new machine has become available to us and I'll try the installing
> the latest version on that. We have the luxury of choosing anywhere
> between 16 GB and 64 GB RAM for the Nominatim VM although I'd like to
> choose at most 32GB to allow more memory for our other VMs on that
> host.
>
> Does it make a lot of difference? Would 32GB be enough to process the
> whole planet or should I just stick to Europe?
32GB should be ok for a planet. The more limiting factor is I/O.
If you could add an SSD to the machine that would give you a huge
performance boost, especially when running minutely updates in the background.
Sarah
More information about the Geocoding
mailing list