[Tilesathome] Using RAM-drive for ROMA temp tables
Milenko
milenko at king-nerd.com
Mon Dec 8 17:17:06 GMT 2008
> -----Original Message-----
> From: tilesathome-bounces at openstreetmap.org [mailto:tilesathome-
> bounces at openstreetmap.org] On Behalf Of Brett Henderson
> Sent: Sunday, December 07, 2008 4:52 PM
> To: Martijn van Oosterhout
> Cc: tilesathome at openstreetmap.org; Mathieu Arnold
> Subject: Re: [Tilesathome] Using RAM-drive for ROMA temp tables
>
> Martijn van Oosterhout wrote:
> > On Sat, Dec 6, 2008 at 9:22 AM, Mathieu Arnold <mat at mat.cc> wrote:
> >
> >> I'd like to add one thing at that. On my instance, the query
> returning the
> >> nodes in the bbox takes at most 1s, and most of the time is under
> 0.1s. The
> >> index takes about 14GB, and I only have 3.5GB of RAM. I do think
> it's *not*
> >> that bad :-)
> >>
> >
> > Note it's more complicated still. Even though the index is 14GB, if
> > you remove all the leaves of the index, it's probably less than 1GB
> > because the width of the index entry is so small. That you cache
> > easily. Which means that each node lookup will take at most 2 disk
> > seeks. Add locality of reference by area and the fact that render
> > requests are not distributed evenly over the world and the average
> > performance would be pretty good.
> >
> From memory when I was playing with this a few months back I came to
> similar conclusions. Identifying nodes and storing ids in a temp table
> didn't take all that long. What took longer was usually the retrieval
> of actual node data based on those values. Locality of values does
> help
> considerably because I suspect pgsql or the OS itself will usually read
> more data than it needs at a time which means that the disk isn't hit
> for every individual node.
>
> I wonder if you'd get any performance increases by filtering the data
> that is stored in the db in the first place. For example, created_by
> tags would be a prime candidate for discarding. The less data in the
> db, the closer the data will be packed together and in theory the less
> disk seeks that will occur. If ROMA is only being used for tiles at home
> you could be fairly selective about the data that is imported.
>
> Brett
I count the following tags that look like they could probably be removed
from the ROMA db:
173 million tiger:tlid
168 million tiger:upload_uuid
42 million created_by
191 million source
These are all from the node_tags table, for which I count 773 million rows
total, so this would reduce the table by a significant amount.
The way_tags db seems to be about 10 - 15% of these numbers.
Is this worth trying? I'll try it on my server if no one else wants to
volunteer. :) Are we sure that these tags are not needed by the clients?
Anyone have any other tags that might not be needed?
-Jeremy
More information about the Tilesathome
mailing list