[Geocoding] Import run for Europe

Simon Nuttall info at cyclestreets.net
Mon Apr 29 21:20:55 UTC 2013


On 29 April 2013 08:30, Sarah Hoffmann <lonvia at denofr.de> wrote:
> Hi Simon,
>
> On Mon, Apr 29, 2013 at 03:39:37AM +0100, Simon Nuttall wrote:
>> Using our installation script mentioned at:
>> http://lists.openstreetmap.org/pipermail/geocoding/2013-April/000779.html
>>
>> I started a Europe wide import on a 4 CPU processor machine with 16GB RAM.
>>
>> That was 13 days ago, on 16 April, and by 21 April the log file said:
>>
>> GRANT SELECT ON place_classtype_natural_islet TO "www-data";drop index
>> idx_placex_classtype;#   Done special phrases Sun Apr 21 20:29:08 BST
>> 2013
>> #       Nominatim website created Sun Apr 21 20:29:09 BST 2013
>
> It looks like your initial import has finished. After this line
> your script seems to start the update process directly, about here:
> https://github.com/cyclestreets/nominatim-install/blob/master/run.sh#L237
>
> This line actually runs the update process in an endless loop, so your
> script will never finish and is happily keeping the database up to date
> now. But the database should already be ready to be used.

Yeah I did try using that for a while:

http://nominatim.cyclestreets.net/

but because the script was still running and was I/O bound the
performance was slow. So I decided to stop and wait until the script
finished.

>
>> and a week later it is still running, saying:
>>
>> Processing: Node(307016k 2.2k/s) Way(39382k 0.09k/s) Relation(118210
>> 14.94/s)NOTICE:  Self-intersection at or near point 52.2009 56.0652
>> CONTEXT:  PL/pgSQL function "place_insert" line 28 at IF
>> COPY place, line 3: "R  1076747 boundary        administrative
>> "name"=>"Новогорское"   8       \N      \N      \N      \N      \N
>>          SRID=4326;POLYG..."
>> Processing: Node(307016k 2.2k/s) Way(39382k 0.09k/s) Relation(118300 14.95/s)
>
> Now that is odd. Normally, an update step just imports a few k of data
> at most. Here you seem to reimport almost the entire planet again. It looks
> like the state file was not set up correctly and you have been applying
> updates of the last half year.
>
> It's hard to say how long the update step above will continue. It is probably
> almost done. So you might just let it run for a bit more but be aware that
> the osm2pgsql processing you see above is followed by indexing which will
> again take quite a bit of time.
>
> It might work to just cancel the process and let the DB roll back. If you
> do that, check afterwards if all rows in the placex table that have
> indexed_status = 0. If that is the case, you should be savely back to the
> state of your initial import. Then you can try to figure out what the
> problem with the state file is (settings/state.txt), get the correct one
> and restart your updates.

I've let it carry on running, and you're right it seems like its doing
the whole import again. Currently I'm seeing lines like:

  Done 1118475 in 44519 @ 25.123543 per second - Rank 26 ETA
(seconds): 152234.937500
  Done 1118482 in 44519 @ 25.123699 per second - Rank 26 ETA
(seconds): 152233.718750
  Done 1118489 in 44520 @ 25.123293 per second - Rank 26 ETA
(seconds): 152235.890625

The settings/state.txt file reads:

#Mon Apr 22 07:40:54 BST 2013
sequenceNumber=247388
timestamp=2013-03-03T20\:59\:02Z


> NB: geofabrik has now dayly diffs for excerpts which will work fine with
> Nominatim. Maybe that is of interest for you. You need to get the latest
> Nominatim version from github, though, and you probably need to reimport
> because the DB schema has changed a little bit.

A new machine has become available to us and I'll try the installing
the latest version on that. We have the luxury of choosing anywhere
between 16 GB and 64 GB RAM for the Nominatim VM although I'd like to
choose at most 32GB to allow more memory for our other VMs on that
host.

Does it make a lot of difference? Would 32GB be enough to process the
whole planet or should I just stick to Europe?



More information about the Geocoding mailing list