[Geocoding] how to estimate hardware needed for nominatim osm2pgsql

Paul Norman penorman at mac.com
Sat Mar 3 01:19:06 UTC 2018


On 3/2/2018 2:01 AM, Josip Rodin wrote:
> On Thu, Mar 01, 2018 at 09:41:54AM -0800, Paul Norman wrote:
>>> when I import the nodes for all of Europe, the ways get processed at a
>>> rate of 30/s
>> It's slow during the osm2pgsql import stage. General advice for
>> osm2pgsql applies here. For a large import, you want more RAM. Ideally,
>> you should have enough cache to fit all the node positions in RAM. For
>> Europe, this is probably 20GB to 25GB on a machine with 32GB of RAM.
> Yesterday the Europe import told me that it processed 2045878k nodes.
> At 8 bytes per lat and 8 per long, that sounds more like 30.5 GB? Not sure
> where osm2pgsql reads it from... st_memsize(place.geometry) seems to return
> 32 bytes actually, would that imply 41 GB? That would seem to match the size
> of the flatnode file, too.

Node positions take 8 bytes per node, and cache efficiency is about 85% 
for the full planet. I haven't done an import for Europe recently, but 
taking 60% as a guess, that would give 26GB cache needed for all node 
positions.

Because flat nodes are persistent, they're designed differently, and 
take 8 bytes * maximum node ID + a few hundred bytes for headers.

> Anyway, a more pertinent point would be how does the size of osm2pgsql cache
> correlate to that, i.e. how do we estimate that it would it organize itself
> in a way that 20 to 25 GB would be enough to get a good hit rate?

The easiest way to get cache efficiency is to look at the log output 
after an import. You could write external software that calculates the 
efficiency for a given list of nodes, but it's easier to run osm2pgsql 
with excess cache (using -O null if you're doing it a lot). Using my 
data from 2015 and https://github.com/openstreetmap/osm2pgsql/pull/441 I 
got 84.5% efficiency for the planet, 62% for Europe, and 59-50% for 2GB 
PBFs and smaller.



More information about the Geocoding mailing list