[Geocoding] Nominatim script log output - how to tell progress?

Simon Nuttall info at cyclestreets.net
Sat Jul 4 09:58:46 UTC 2015


On 4 July 2015 at 10:52, Simon Nuttall <info at cyclestreets.net> wrote:
> On 4 July 2015 at 10:47, Simon Nuttall <info at cyclestreets.net> wrote:
>> On 3 July 2015 at 19:20, Sarah Hoffmann <lonvia at denofr.de> wrote:
>>> On Fri, Jul 03, 2015 at 07:23:54AM +0100, Simon Nuttall wrote:
>>>> Now it is showing these again:
>>>>·
>>>>   Done 274 in 136 @ 2.014706 per second - Rank 26 ETA (seconds): 2467.854004
>>>>·
>>>> Presumably this means it is now playing catchup relative to the
>>>> original download data?
>>>
>>> I would suppose so.
>>>
>>>> How can I tell what date it has caught up to? (And thus get an idea of
>>>> when it is likely to finish?)
>>>
>>> Have a look at the import_osmosis_log table. It gives you a good idea
>>> how long the batches take.
>>
>> Ah yes - pretty slow :-(
>>
>> nominatim=# select * from import_osmosis_log order by endtime desc limit 12;
>>       batchend       | batchsize |      starttime      |       endtime
>>       |   event
>> ---------------------+-----------+---------------------+---------------------+-----------
>>  2015-06-09 12:54:02 |  40037028 | 2015-07-04 09:30:16 | 2015-07-04
>> 09:30:29 | osmosis
>>  2015-06-09 11:55:01 |  36866133 | 2015-07-04 08:57:52 | 2015-07-04
>> 09:30:16 | index
>>  2015-06-09 11:55:01 |  36866133 | 2015-07-04 08:34:17 | 2015-07-04
>> 08:57:52 | osm2pgsql
>>  2015-06-09 11:55:01 |  36866133 | 2015-07-04 08:34:06 | 2015-07-04
>> 08:34:17 | osmosis
>>  2015-06-09 10:55:02 |  42220289 | 2015-07-04 08:06:14 | 2015-07-04
>> 08:34:06 | index
>>  2015-06-09 10:55:02 |  42220289 | 2015-07-04 07:41:23 | 2015-07-04
>> 08:06:14 | osm2pgsql
>>  2015-06-09 10:55:02 |  42220289 | 2015-07-04 07:41:11 | 2015-07-04
>> 07:41:23 | osmosis
>>  2015-06-09 09:55:02 |  34076756 | 2015-07-04 07:14:30 | 2015-07-04
>> 07:41:11 | index
>>  2015-06-09 09:55:02 |  34076756 | 2015-07-04 06:53:59 | 2015-07-04
>> 07:14:30 | osm2pgsql
>>  2015-06-09 09:55:02 |  34076756 | 2015-07-04 06:53:49 | 2015-07-04
>> 06:53:59 | osmosis
>>  2015-06-09 08:56:01 |  26087298 | 2015-07-04 06:20:20 | 2015-07-04
>> 06:53:49 | index
>>  2015-06-09 08:56:01 |  26087298 | 2015-07-04 06:07:22 | 2015-07-04
>> 06:20:20 | osm2pgsql
>>
>>
>>>
>>>> Is it catching up by downloading minutely diffs or using larger
>>>> intervals, then switching to minutely diffs when it is almost fully up
>>>> to date?
>>>
>>> That depends how you have configured it. If it is set to the URL
>>> of the minutelies it will use minutely diffs but accumulate them
>>> to batches of the size you have configured. When it has caught up
>>> it will just accumulate the latest minutelies, so batches become
>>> smaller.
>>
>> Ah yes, I see the configuration.txt has:
>
> (oops - last email was sent prematurely)
>
> # The URL of the directory containing change files.
> baseUrl=http://planet.openstreetmap.org/replication/minute
>
> # Defines the maximum time interval in seconds to download in a single
> invocation.
> # Setting to 0 disables this feature.
> maxInterval = 3600
>
>>
>>
>>>
>>>> This phase still seems very disk intensive, will that settle down and
>>>> become much less demanding when it has eventually got up to date?
>>>
>>> It will become less but there still is IO going on. Given that your
>>> initial import took about 10 times as long as the best time I've seen,
>>> it will probably take a long time to catch up. You should consider
>>> running with --index-instances 2 while catching up and you should
>>> really investigate where the bottleneck in the system is.
>
> I notice that our postgresql.conf has
>
> work_mem = 512MB
>
> which seems a bit small?
>
> But this seems healthy:
> maintenance_work_mem = 10GB
>
>>>
>>>> Can the whole installed running Nominatim be copied to another
>>>> machine? And set running?
>>>>
>>>> Presumably this is a database dump and copy - but how practical is that?
>>>
>>> Yes, dump and restore is possible. You should be aware that indexes
>>> are not dumped, so it still takes a day or two to restore the complete
>>> database.
>>>
>>>> Are there alternative ideas such as replication or backup?
>>>
>>> For backup you can do partial dumps that contain only tables needed
>>> for querying the database. These dumps can be faster restored but
>>> they are not updateable, so they are more of an interim solution
>>> to install on a spare emergency server while the main DB is reimported.
>>> The dump/backup script used for the osm.org servers can be found here:
>>>
>>> https://github.com/openstreetmap/chef/blob/master/cookbooks/nominatim/templates/default/backup-nominatim.erb
>>>
>>> If you go down that road, I recommend actually trying the restore
>>> at least once, so you get an idea about the time and space requirements.
>>>
>>> Replication is possible as well. In fact, the two osm.org servers have
>>> been running as master and slave with streaming replication for about
>>> two weeks now. You should disable writing logs to the database.
>>> Otherwise the setup is fairly standard, following largely this
>>> guide: https://wiki.postgresql.org/wiki/Streaming_Replication
>
> You've put off trying this - for now at least.
>
>>>
>>>> > string(123) "INSERT INTO import_osmosis_log values
>>>> > ('2015-06-08T07:58:02Z',25816916,'2015-07-03 06:07:34','2015-07-03
>>>> > 06:44:10','index')"
>>>> > 2015-07-03 06:44:10 Completed index step for 2015-06-08T07:58:02Z in
>>>> > 36.6 minutes
>>>> > 2015-07-03 06:44:10 Completed all for 2015-06-08T07:58:02Z in 58.05 minutes
>>>> > 2015-07-03 06:44:10 Sleeping 0 seconds
>>>> > /usr/local/bin/osmosis --read-replication-interval
>>>> > workingDirectory=/home/nominatim/Nominatim/settings --simplify-change
>>>> > --write-xml-change /home/nominatim/Nominatim/data/osmosischange.osc
>>>> >
>>>> > Which presumably means it is updating June 8th? (What else can I read
>>>> > from this?)
>>>
>>> See above, check out the import_osmosis_log. The important thing to take
>>> away is how long it takes to update which interval. If on average the
>>> import takes longer than real time you are in trouble.
>>>
>>>> > Also, at what point is it safe to expose the Nominatim as a live service?
>>>
>>> As soon as the import is finished. Search queries might interfere with
>>> the updates when your server gets swarmed with lots of parallel queries
>>> but I doubt that you have enough traffic for that.
>
> Yeah - shouldn't be too many - at this stage.
>
>>> Just make sure to keep
>>> the number of requests that can hit the database in parallel at a moderate
>>> level. Use php-fpm with limited pools for that and experiment with the
>>> limits until you get the maximum performance.
>>>
>>> Sarah
>>
>>

Just a few more questions...

I'll need to restart postgres to have some config changes take effect
so this means I'll need to interrupt the updates.

During the

./utils/update.php --import-osmosis-all --no-npi --osm2pgsql-cache
24000 --index-instances 2

phase can I stop it at anytime with Ctrl+C ?

Or only when I am seeing lines like..

  Done 929 in 147 @ 6.319728 per second - Rank 26 ETA (seconds): 801.616821

To resume do I just use the same command or do I have to do anything else first?

Thanks again for your patient help with this - we use MySQL in
CycleStreets and Postgres is rather unfamiliar territory for me.

-- 
Simon Nuttall

Route Master, CycleStreets.net



More information about the Geocoding mailing list