[Geocoding] Nominatim script log output - how to tell progress?
Sarah Hoffmann
lonvia at denofr.de
Fri Jul 3 18:20:40 UTC 2015
On Fri, Jul 03, 2015 at 07:23:54AM +0100, Simon Nuttall wrote:
> Now it is showing these again:
>·
> Done 274 in 136 @ 2.014706 per second - Rank 26 ETA (seconds): 2467.854004
>·
> Presumably this means it is now playing catchup relative to the
> original download data?
I would suppose so.
> How can I tell what date it has caught up to? (And thus get an idea of
> when it is likely to finish?)
Have a look at the import_osmosis_log table. It gives you a good idea
how long the batches take.
> Is it catching up by downloading minutely diffs or using larger
> intervals, then switching to minutely diffs when it is almost fully up
> to date?
That depends how you have configured it. If it is set to the URL
of the minutelies it will use minutely diffs but accumulate them
to batches of the size you have configured. When it has caught up
it will just accumulate the latest minutelies, so batches become
smaller.
> This phase still seems very disk intensive, will that settle down and
> become much less demanding when it has eventually got up to date?
It will become less but there still is IO going on. Given that your
initial import took about 10 times as long as the best time I've seen,
it will probably take a long time to catch up. You should consider
running with --index-instances 2 while catching up and you should
really investigate where the bottleneck in the system is.
> Can the whole installed running Nominatim be copied to another
> machine? And set running?
>
> Presumably this is a database dump and copy - but how practical is that?
Yes, dump and restore is possible. You should be aware that indexes
are not dumped, so it still takes a day or two to restore the complete
database.
> Are there alternative ideas such as replication or backup?
For backup you can do partial dumps that contain only tables needed
for querying the database. These dumps can be faster restored but
they are not updateable, so they are more of an interim solution
to install on a spare emergency server while the main DB is reimported.
The dump/backup script used for the osm.org servers can be found here:
https://github.com/openstreetmap/chef/blob/master/cookbooks/nominatim/templates/default/backup-nominatim.erb
If you go down that road, I recommend actually trying the restore
at least once, so you get an idea about the time and space requirements.
Replication is possible as well. In fact, the two osm.org servers have
been running as master and slave with streaming replication for about
two weeks now. You should disable writing logs to the database.
Otherwise the setup is fairly standard, following largely this
guide: https://wiki.postgresql.org/wiki/Streaming_Replication
> > string(123) "INSERT INTO import_osmosis_log values
> > ('2015-06-08T07:58:02Z',25816916,'2015-07-03 06:07:34','2015-07-03
> > 06:44:10','index')"
> > 2015-07-03 06:44:10 Completed index step for 2015-06-08T07:58:02Z in
> > 36.6 minutes
> > 2015-07-03 06:44:10 Completed all for 2015-06-08T07:58:02Z in 58.05 minutes
> > 2015-07-03 06:44:10 Sleeping 0 seconds
> > /usr/local/bin/osmosis --read-replication-interval
> > workingDirectory=/home/nominatim/Nominatim/settings --simplify-change
> > --write-xml-change /home/nominatim/Nominatim/data/osmosischange.osc
> >
> > Which presumably means it is updating June 8th? (What else can I read
> > from this?)
See above, check out the import_osmosis_log. The important thing to take
away is how long it takes to update which interval. If on average the
import takes longer than real time you are in trouble.
> > Also, at what point is it safe to expose the Nominatim as a live service?
As soon as the import is finished. Search queries might interfere with
the updates when your server gets swarmed with lots of parallel queries
but I doubt that you have enough traffic for that. Just make sure to keep
the number of requests that can hit the database in parallel at a moderate
level. Use php-fpm with limited pools for that and experiment with the
limits until you get the maximum performance.
Sarah
More information about the Geocoding
mailing list