[OSM-dev] Running PostGIS on limited memory

Wed Mar 7 21:27:26 GMT 2007

On Wed, 2007-03-07 at 21:54 +0100, Olivier Macchioni wrote:
> 
> Hi John, hi all... 
> 
>         That sounds reasonable, but one of the design assumptions that
>         the 
>         current code makes is that the dataset is 'dense' i.e. most
>         nodes,
>         segment and way IDs exist and have data. This is why the
>         current code
>         uses static arrays for these which is more efficient than a
>         dynamic 
>         structure (e.g. std::map). If however we assumed however a
>         sparse data
>         set then some dynamic storage system for nodes/segments/ways
>         would be
>         more efficient.
> 
> 
> For what is't worth, on a DB which is a few weeks old:
> 
> mysql> select max(id), count(*), max(id)/count(*) from nodes;
> +----------+----------+------------------+
> | max(id)  | count(*) | max(id)/count(*) |
> +----------+----------+------------------+
> | 25399795 |  7618099 |             3.33 |
> +----------+----------+------------------+
> 1 row in set (0.01 sec)
> 
> mysql> select max(id), count(*), max(id)/count(*) from segments;
> +----------+----------+------------------+
> | max(id)  | count(*) | max(id)/count(*) |
> +----------+----------+------------------+
> | 21392044 |  7539635 |             2.84 |
> +----------+----------+------------------+
> 1 row in set (0.07 sec)
> 
> On the performance side, obviously allocating a static array once is
> much faster than playing with hashes.
> 
When the tiger data was in the DB I believe the density was over 90%. I
originally developed the C version of osm2pgsql at this time.

The data got removed because It contained large amounts of duplicate
data (see
http://lists.openstreetmap.org/pipermail/dev/2006-November/002528.html )

Since the tiger data was around 80% of the whole DB this left quite a
lot of unused IDs.

	Jon