[Geocoding] Import run for Europe
Simon Nuttall
info at cyclestreets.net
Mon Apr 29 22:42:43 UTC 2013
On 29 April 2013 23:34, Sarah Hoffmann <lonvia at denofr.de> wrote:
> On Mon, Apr 29, 2013 at 10:20:55PM +0100, Simon Nuttall wrote:
>> On 29 April 2013 08:30, Sarah Hoffmann <lonvia at denofr.de> wrote:
>> > Hi Simon,
>> >
>> > On Mon, Apr 29, 2013 at 03:39:37AM +0100, Simon Nuttall wrote:
>> >> Using our installation script mentioned at:
>> >> http://lists.openstreetmap.org/pipermail/geocoding/2013-April/000779.html
>> >>
>> >> I started a Europe wide import on a 4 CPU processor machine with 16GB RAM.
>> >>
>> >> That was 13 days ago, on 16 April, and by 21 April the log file said:
>> >>
>> >> GRANT SELECT ON place_classtype_natural_islet TO "www-data";drop index
>> >> idx_placex_classtype;# Done special phrases Sun Apr 21 20:29:08 BST
>> >> 2013
>> >> # Nominatim website created Sun Apr 21 20:29:09 BST 2013
>> >
>> > It looks like your initial import has finished. After this line
>> > your script seems to start the update process directly, about here:
>> > https://github.com/cyclestreets/nominatim-install/blob/master/run.sh#L237
>> >
>> > This line actually runs the update process in an endless loop, so your
>> > script will never finish and is happily keeping the database up to date
>> > now. But the database should already be ready to be used.
>>
>> Yeah I did try using that for a while:
>>
>> http://nominatim.cyclestreets.net/
>>
>> but because the script was still running and was I/O bound the
>> performance was slow. So I decided to stop and wait until the script
>> finished.
>
> It's mostly osm2pgsql that causes responsiveless to decrese. It helps a
> lot to keep the updates small. Basically, I wouldn't recommend to import
> updates of more than an hour at a time, even when the machine is offline and
> just catching up. The osm instance is currently set to half hour chunks
> which seems to work well even when it is under heavy load.
>
> You seem to have set your system to arbitrarily large chunks. Not sure
> how that happend. You'll find the setting in settings/configuration.txt
> under 'maxInterval'.
Here is that file:
# The URL of the directory containing change files.
baseUrl=http://planet.openstreetmap.org/replication/minute
# Defines the maximum time interval in seconds to download in a single
invocation.
# Setting to 0 disables this feature.
maxInterval = 3600
>
>> >> Processing: Node(307016k 2.2k/s) Way(39382k 0.09k/s) Relation(118300 14.95/s)
>> >
>> > Now that is odd. Normally, an update step just imports a few k of data
>> > at most. Here you seem to reimport almost the entire planet again. It looks
>> > like the state file was not set up correctly and you have been applying
>> > updates of the last half year.
>> >
>> > It's hard to say how long the update step above will continue. It is probably
>> > almost done. So you might just let it run for a bit more but be aware that
>> > the osm2pgsql processing you see above is followed by indexing which will
>> > again take quite a bit of time.
>> >
>> > It might work to just cancel the process and let the DB roll back. If you
>> > do that, check afterwards if all rows in the placex table that have
>> > indexed_status = 0. If that is the case, you should be savely back to the
>> > state of your initial import. Then you can try to figure out what the
>> > problem with the state file is (settings/state.txt), get the correct one
>> > and restart your updates.
>>
>> I've let it carry on running, and you're right it seems like its doing
>> the whole import again. Currently I'm seeing lines like:
>>
>> Done 1118475 in 44519 @ 25.123543 per second - Rank 26 ETA
>> (seconds): 152234.937500
>> Done 1118482 in 44519 @ 25.123699 per second - Rank 26 ETA
>> (seconds): 152233.718750
>> Done 1118489 in 44520 @ 25.123293 per second - Rank 26 ETA
>> (seconds): 152235.890625
>>
>> The settings/state.txt file reads:
>>
>> #Mon Apr 22 07:40:54 BST 2013
>> sequenceNumber=247388
>> timestamp=2013-03-03T20\:59\:02Z
>
> As far as I know, osmosis should have updated that by now which would mean
> that you have imported around 4 months worth of updates and are still only
> in March. Hard to say what went wrong without knowing what exactly was in
> the state file before the update started. You'll probably be faster with a
> reimport at this point.
OK, well I could cancel it as our new machine will come available in a
couple of days.
>
>> > NB: geofabrik has now dayly diffs for excerpts which will work fine with
>> > Nominatim. Maybe that is of interest for you. You need to get the latest
>> > Nominatim version from github, though, and you probably need to reimport
>> > because the DB schema has changed a little bit.
>>
>> A new machine has become available to us and I'll try the installing
>> the latest version on that. We have the luxury of choosing anywhere
>> between 16 GB and 64 GB RAM for the Nominatim VM although I'd like to
>> choose at most 32GB to allow more memory for our other VMs on that
>> host.
>>
>> Does it make a lot of difference? Would 32GB be enough to process the
>> whole planet or should I just stick to Europe?
>
> 32GB should be ok for a planet. The more limiting factor is I/O.
> If you could add an SSD to the machine that would give you a huge
> performance boost, especially when running minutely updates in the background.
Noted, thanks.
More information about the Geocoding
mailing list