[Tilesathome] Using RAM-drive for ROMA temp tables

Milenko milenko at king-nerd.com
Mon Dec 8 17:17:06 GMT 2008


> -----Original Message-----
> From: tilesathome-bounces at openstreetmap.org [mailto:tilesathome-
> bounces at openstreetmap.org] On Behalf Of Brett Henderson
> Sent: Sunday, December 07, 2008 4:52 PM
> To: Martijn van Oosterhout
> Cc: tilesathome at openstreetmap.org; Mathieu Arnold
> Subject: Re: [Tilesathome] Using RAM-drive for ROMA temp tables
> 
> Martijn van Oosterhout wrote:
> > On Sat, Dec 6, 2008 at 9:22 AM, Mathieu Arnold <mat at mat.cc> wrote:
> >
> >> I'd like to add one thing at that. On my instance, the query
> returning the
> >> nodes in the bbox takes at most 1s, and most of the time is under
> 0.1s. The
> >> index takes about 14GB, and I only have 3.5GB of RAM. I do think
> it's *not*
> >> that bad :-)
> >>
> >
> > Note it's more complicated still. Even though the index is 14GB, if
> > you remove all the leaves of the index, it's probably less than 1GB
> > because the width of the index entry is so small. That you cache
> > easily. Which means that each node lookup will take at most 2 disk
> > seeks. Add locality of reference by area and the fact that render
> > requests are not distributed evenly over the world and the average
> > performance would be pretty good.
> >
>  From memory when I was playing with this a few months back I came to
> similar conclusions.  Identifying nodes and storing ids in a temp table
> didn't take all that long.  What took longer was usually the retrieval
> of actual node data based on those values.  Locality of values does
> help
> considerably because I suspect pgsql or the OS itself will usually read
> more data than it needs at a time which means that the disk isn't hit
> for every individual node.
> 
> I wonder if you'd get any performance increases by filtering the data
> that is stored in the db in the first place.  For example, created_by
> tags would be a prime candidate for discarding.  The less data in the
> db, the closer the data will be packed together and in theory the less
> disk seeks that will occur.  If ROMA is only being used for tiles at home
> you could be fairly selective about the data that is imported.
> 
> Brett

I count the following tags that look like they could probably be removed
from the ROMA db:

	173 million tiger:tlid 
	168 million tiger:upload_uuid
	42 million created_by
	191 million source
	
These are all from the node_tags table, for which I count 773 million rows
total, so this would reduce the table by a significant amount.

The way_tags db seems to be about 10 - 15% of these numbers.

Is this worth trying?  I'll try it on my server if no one else wants to
volunteer. :)  Are we sure that these tags are not needed by the clients?
Anyone have any other tags that might not be needed?

-Jeremy






More information about the Tilesathome mailing list