[Geocoding] Nominatim script log output - how to tell progress?

Sat Jul 4 09:47:27 UTC 2015

On 3 July 2015 at 19:20, Sarah Hoffmann <lonvia at denofr.de> wrote:
> On Fri, Jul 03, 2015 at 07:23:54AM +0100, Simon Nuttall wrote:
>> Now it is showing these again:
>>·
>>   Done 274 in 136 @ 2.014706 per second - Rank 26 ETA (seconds): 2467.854004
>>·
>> Presumably this means it is now playing catchup relative to the
>> original download data?
>
> I would suppose so.
>
>> How can I tell what date it has caught up to? (And thus get an idea of
>> when it is likely to finish?)
>
> Have a look at the import_osmosis_log table. It gives you a good idea
> how long the batches take.

Ah yes - pretty slow :-(

nominatim=# select * from import_osmosis_log order by endtime desc limit 12;
      batchend       | batchsize |      starttime      |       endtime
      |   event
---------------------+-----------+---------------------+---------------------+-----------
 2015-06-09 12:54:02 |  40037028 | 2015-07-04 09:30:16 | 2015-07-04
09:30:29 | osmosis
 2015-06-09 11:55:01 |  36866133 | 2015-07-04 08:57:52 | 2015-07-04
09:30:16 | index
 2015-06-09 11:55:01 |  36866133 | 2015-07-04 08:34:17 | 2015-07-04
08:57:52 | osm2pgsql
 2015-06-09 11:55:01 |  36866133 | 2015-07-04 08:34:06 | 2015-07-04
08:34:17 | osmosis
 2015-06-09 10:55:02 |  42220289 | 2015-07-04 08:06:14 | 2015-07-04
08:34:06 | index
 2015-06-09 10:55:02 |  42220289 | 2015-07-04 07:41:23 | 2015-07-04
08:06:14 | osm2pgsql
 2015-06-09 10:55:02 |  42220289 | 2015-07-04 07:41:11 | 2015-07-04
07:41:23 | osmosis
 2015-06-09 09:55:02 |  34076756 | 2015-07-04 07:14:30 | 2015-07-04
07:41:11 | index
 2015-06-09 09:55:02 |  34076756 | 2015-07-04 06:53:59 | 2015-07-04
07:14:30 | osm2pgsql
 2015-06-09 09:55:02 |  34076756 | 2015-07-04 06:53:49 | 2015-07-04
06:53:59 | osmosis
 2015-06-09 08:56:01 |  26087298 | 2015-07-04 06:20:20 | 2015-07-04
06:53:49 | index
 2015-06-09 08:56:01 |  26087298 | 2015-07-04 06:07:22 | 2015-07-04
06:20:20 | osm2pgsql

>
>> Is it catching up by downloading minutely diffs or using larger
>> intervals, then switching to minutely diffs when it is almost fully up
>> to date?
>
> That depends how you have configured it. If it is set to the URL
> of the minutelies it will use minutely diffs but accumulate them
> to batches of the size you have configured. When it has caught up
> it will just accumulate the latest minutelies, so batches become
> smaller.

Ah yes, I see the configuration.txt has:

>
>> This phase still seems very disk intensive, will that settle down and
>> become much less demanding when it has eventually got up to date?
>
> It will become less but there still is IO going on. Given that your
> initial import took about 10 times as long as the best time I've seen,
> it will probably take a long time to catch up. You should consider
> running with --index-instances 2 while catching up and you should
> really investigate where the bottleneck in the system is.
>
>> Can the whole installed running Nominatim be copied to another
>> machine? And set running?
>>
>> Presumably this is a database dump and copy - but how practical is that?
>
> Yes, dump and restore is possible. You should be aware that indexes
> are not dumped, so it still takes a day or two to restore the complete
> database.
>
>> Are there alternative ideas such as replication or backup?
>
> For backup you can do partial dumps that contain only tables needed
> for querying the database. These dumps can be faster restored but
> they are not updateable, so they are more of an interim solution
> to install on a spare emergency server while the main DB is reimported.
> The dump/backup script used for the osm.org servers can be found here:
>
> https://github.com/openstreetmap/chef/blob/master/cookbooks/nominatim/templates/default/backup-nominatim.erb
>
> If you go down that road, I recommend actually trying the restore
> at least once, so you get an idea about the time and space requirements.
>
> Replication is possible as well. In fact, the two osm.org servers have
> been running as master and slave with streaming replication for about
> two weeks now. You should disable writing logs to the database.
> Otherwise the setup is fairly standard, following largely this
> guide: https://wiki.postgresql.org/wiki/Streaming_Replication
>
>> > string(123) "INSERT INTO import_osmosis_log values
>> > ('2015-06-08T07:58:02Z',25816916,'2015-07-03 06:07:34','2015-07-03
>> > 06:44:10','index')"
>> > 2015-07-03 06:44:10 Completed index step for 2015-06-08T07:58:02Z in
>> > 36.6 minutes
>> > 2015-07-03 06:44:10 Completed all for 2015-06-08T07:58:02Z in 58.05 minutes
>> > 2015-07-03 06:44:10 Sleeping 0 seconds
>> > /usr/local/bin/osmosis --read-replication-interval
>> > workingDirectory=/home/nominatim/Nominatim/settings --simplify-change
>> > --write-xml-change /home/nominatim/Nominatim/data/osmosischange.osc
>> >
>> > Which presumably means it is updating June 8th? (What else can I read
>> > from this?)
>
> See above, check out the import_osmosis_log. The important thing to take
> away is how long it takes to update which interval. If on average the
> import takes longer than real time you are in trouble.
>
>> > Also, at what point is it safe to expose the Nominatim as a live service?
>
> As soon as the import is finished. Search queries might interfere with
> the updates when your server gets swarmed with lots of parallel queries
> but I doubt that you have enough traffic for that. Just make sure to keep
> the number of requests that can hit the database in parallel at a moderate
> level. Use php-fpm with limited pools for that and experiment with the
> limits until you get the maximum performance.
>
> Sarah

-- 
Simon Nuttall

Route Master, CycleStreets.net