[OSM-dev] Running PostGIS on limited memory
Jon Burgess
jburgess777 at googlemail.com
Wed Mar 7 21:27:26 GMT 2007
On Wed, 2007-03-07 at 21:54 +0100, Olivier Macchioni wrote:
>
> Hi John, hi all...
>
> That sounds reasonable, but one of the design assumptions that
> the
> current code makes is that the dataset is 'dense' i.e. most
> nodes,
> segment and way IDs exist and have data. This is why the
> current code
> uses static arrays for these which is more efficient than a
> dynamic
> structure (e.g. std::map). If however we assumed however a
> sparse data
> set then some dynamic storage system for nodes/segments/ways
> would be
> more efficient.
>
>
> For what is't worth, on a DB which is a few weeks old:
>
> mysql> select max(id), count(*), max(id)/count(*) from nodes;
> +----------+----------+------------------+
> | max(id) | count(*) | max(id)/count(*) |
> +----------+----------+------------------+
> | 25399795 | 7618099 | 3.33 |
> +----------+----------+------------------+
> 1 row in set (0.01 sec)
>
> mysql> select max(id), count(*), max(id)/count(*) from segments;
> +----------+----------+------------------+
> | max(id) | count(*) | max(id)/count(*) |
> +----------+----------+------------------+
> | 21392044 | 7539635 | 2.84 |
> +----------+----------+------------------+
> 1 row in set (0.07 sec)
>
> On the performance side, obviously allocating a static array once is
> much faster than playing with hashes.
>
When the tiger data was in the DB I believe the density was over 90%. I
originally developed the C version of osm2pgsql at this time.
The data got removed because It contained large amounts of duplicate
data (see
http://lists.openstreetmap.org/pipermail/dev/2006-November/002528.html )
Since the tiger data was around 80% of the whole DB this left quite a
lot of unused IDs.
Jon
More information about the dev
mailing list