[OSM-dev] Running PostGIS on limited memory

Wed Mar 7 20:10:29 GMT 2007

On Wed, 2007-03-07 at 08:44 +0100, Raphaël Jacquot wrote:
> Nick Whitelegg wrote:
> > On Wednesday 07 Mar 2007 00:39, Nick Whitelegg wrote:
> >> Has anyone encountered this? I just can't get it to run. (latest SVN
> >> version)
> > 
> > I think I figured this out after posting, but by that time I'd gone to bed :-) 
> > It's using up too much memory - the array of 35 million nodes and segments is 
> > more than my Bytemark VM can handle. It looks like osm2pgsql.c will need a 
> > rewrite so that it uses a std::map rather than array.
> 
> no, it's just doing things the wrong way.
> the real solution is to use the functionnalities of the db server 
> (storing arrays of things and allowing fetching them fast) to do it. 
> loading everything in ram is the opposite of what should be done.

That is certainly an approach which would work. Unfortunately it would
take a fair amount of effort to re-code the relevant sections of
osm2pgsql. It would also involve create a direct database connection
from the program with the DB instead of the indirect SQL output which it
does now.

I think you are overstating your position to say that the current
approach is the 'wrong way'. It has been working reasonably well for
several people for the past few months. As with many things, there are
trade offs to be made. 

> > Incidentally, before I spend too much time on this, would the PostGIS approach 
> > of storing the whole of the UK (with appropriate caching of generated images) 
> > on the Bytemark VM be feasible, given that I only have 160MB available, or 
> > would I be better off with the original plan of fetching .osm data in smaller 
> > tiles from the live OSM server, generating images on the fly and caching them 
> > for, say a month?

That sounds reasonable, but one of the design assumptions that the
current code makes is that the dataset is 'dense' i.e. most nodes,
segment and way IDs exist and have data. This is why the current code
uses static arrays for these which is more efficient than a dynamic
structure (e.g. std::map). If however we assumed however a sparse data
set then some dynamic storage system for nodes/segments/ways would be
more efficient. 

Given the current design this means that even if you cut down the OSM
file size you will not save much memory. 

I may take a look at integrating the direct DB storage for intermediate
data, but I won't promise any timescales.

	Jon