[OSM-dev] scaling

Tom Hughes tom at compton.nu
Tue Jan 11 09:31:10 GMT 2011


On 11/01/11 03:31, Steve Singer wrote:

> On Tue, 11 Jan 2011, Matt Amos wrote:
> 
>> it's possible to rewrite it so that it doesn't use temp tables, but
>> that would be at the cost of more complex queries or higher memory
>> usage. i presume that postgres doesn't currently allow it because it's
>> technically non-trivial, but beyond that i have no idea what the level
>> of difficulty is...
>>
>> i had a quick look on the postgres mailing lists and didn't spot
>> anything obvious about it - is it worth bringing it up there and
>> seeing what their take on it is?
> 
> There have been discussions on pgsql-hackers about adding temp tables to
> replicas [1].  We aren't the only ones who would like that feature.  Temp
> tables on replicas won't be in 9.1.

That thread certainly makes it look pretty bleak. There is in fact an
OSMer in that thread although he didn't mention that this was something
we were interested in.

> Rewriting queries so they don't use temp tables won't run into any
> memory limits.  When a query result exceeds work_mem Postgresql
> automatically spills to disk.

He was talking about memory usage in cgimap, not the postgres server, as
the alternative to creating temporary tables would be caching the data
in memory in cgimap.

The only alternative to that which I know would be to read the data in
postgres multiple times.

Basically the issue is that the current strategy goes something like this:

  - Create a temporary table containing a list of nodes in the
    target area by selecting from the node tables.

  - Create a temporary table containing a list of ways in the
    target area by joining from the node temporary table to the
    way tables.

  - Create a temporary table containing a list of relations in
    the target area by joining from the node and way temporary
    tables to the relation tables.

  - Select all nodes listed in the temporary nodes table or used
    by the ways in the temporary ways tables and return them.

  - Select all ways listed in the temporary ways tables and
    return them.

  - Select all relations listed in the temporary relations
    tables and return them.

Obviously that can be rewritten by replacing each use of a temporary
table with a subquery but as the temporary tables are used multiple
times that leads to multiple evaluations of those queries.

Hence Matt was saying that the real alternative would be to replace the
temporary tables with storing the lists of objects in memory on the
cgimap client. That has both a memory overhead and a cost overhead in
sending long lists of objects back to postgres in subsequent queries.

> It might also be worth thinking about some of non built-in replication
> options (ie Slony or Londiste).  They do introduce some overhead to
> writes on the master but if we can offload enough reads you can see
> significant overal gains. Administration pain will go up but that comes
> with the teritory of clustering.

Everything I've read says that the non-builtin replication systems are a
huge amount of hassle.

Tom

-- 
Tom Hughes (tom at compton.nu)
http://compton.nu/



More information about the dev mailing list