[Geocoding] Nominatim performance

Marc Tobias mtm at opencagedata.com
Mon Mar 10 13:44:15 UTC 2014


> On 08-03-14 11:20, Cambridge Cycling Campaign - Simon Nuttall wrote:
>> A whole world nominatim installation started on 2 Feb finished on 7
>> Mar, and is now running at:
>>
>> http://nominatim.cyclestreets.net/
>>
>> It has entered the catch up phase - which began with data from 21 Jan.
>>
>> It seems to be taking a day to catch up with a day's updates - so it
>> is not making much progress and is heavily i/o bound, even on a 32GB
>> VM with 2 allocated processors.
>>
>> Do all these timings seem normal?
>>
>> What could I do to improve performance?
>>
>> Simon
> Most Virtual Machines lack IO performance.  Even if underlying storage
> consists of SSD's, you're only getting a slice of it.   I would suggest
> to assign a few more VCPU's , 2 seems a bit low.   But this sounds like
> a true I/O related issue.
>
> A 'real' machine with 32GB on performant SSD's is probably the minimum
> for a planet import.  (I believe 64GB is recomended for this sort of
> install).  I would definitely blame IO limits here.
>
> Glenn
>

I agree. I've worked on virtual machines were copying the planet
file, just 'cp file file.bak' took longer than downloading it
from the internet. That's because if you reserve 1TB of storage
the provider really gives you 100GB and expands that once the
operating system uses more. One work-around is to write 1TB of
zeros to disc before you use it.

It makes perfect sense for virtual server providers to split
the CPU from storage because on a large scale with 1000s of customers
they have different scaling patterns. Thus storage, even SSDs, can
be installed the same machine as the CPU but it's still abstracted
enough to be on the network. And the customer is never told so.

I think (and again Amazon probably reserves the right so abstract
it) the larger Amazon AWS instances have SSDs local and not
shared http://aws.amazon.com/ec2/pricing/
(See 'storage optimized')

Anybody have experience building Nominatim on those machines?

One option is to rent a high performance machine for a short time,
do a full initial build, and then move the database to a cheaper
machine for daily/hourly updates.
A 4-core, 30GB RAM, 800GB SSD server costs 41 USD for two days. It
costs 624 USD per month so running it 24/7 is not attractive.

Again, does anybody run such a setup or is a full rebuild too
rare?

marc tobias
mtm at opencagedata.com



More information about the Geocoding mailing list