[Openstreetmap-dev] Cache the whole world.
Mikel Maron
mikel_maron at yahoo.com
Thu Feb 16 11:27:07 GMT 2006
I have been looking at tiling and have made some progress, but haven't had as much time as I'd like to spend on this, and obviously things still aren't working. Looks like a good time for an update on what I've done, and maybe I can elucidate some particulars where everyone could contribute.
* Mapserver Replacement
mapserver is nice software, but it's not really intended to serve multitudes of tiles simultaneously. It's bulky in memory, and overkill for OSM usage which is simply composting two WMS requests together. Also, it can not be configured to use a web proxy (as needed for the Landsat Cache described below).
I've written a replacement for our mapserver usage, wms.rbx [ http://www.openstreetmap.org/trac/browser/ruby/api/wms/wms.rbx ]
which accurately reproduces the functionality. However, when I've tested it live on the tile server (transparently through an Apache RewriteRule) things don't go so well -- tile becomes unresponsive, strange since this should be much lighter weight than mapserver. I haven't had time to analyse this yet.
It could be something with the Ruby HTTP library, with ImageMagick, or some odd thing with mod_ruby. If anyone could profile wms.rbx, or track down the likely cause of trouble, that's incredibly helpful.
Another thing about this script (and mapserver), it requests the streets and gpx layers later through another call to Apache -- so every tile request actually ties up two apache threads! This script should be modified to directly call the streets drawing routines.
* Street Drawing
My profiling of streets.rbx showed Ruby to be incredibly inefficient at doing large numbers of iterative math and drawing calls. Perl, for one, is much faster, and C would be even faster. I quickly tested this out by prototyping a perl replacement for streets.rbx, streets.pl [ http://www.openstreetmap.org/trac/browser/ruby/api/wms/streets.pl ]. It seems pretty good, but haven't tested it fully, and the big minus is the duplication of SQL calls from dao.rb.
The best solution could be writing some inline C or extending Ruby. There's a section in the Pragmatic Programmers Ruby book. The portion of streets.rbx doing calculations and drawing could be replaced by an extension, while keeping the Ruby SQL calls.
* Landsat Caching
There is a squid cache dedicated to storing Landsat requests on the dev machine (also known as landsat.openstreetmap.org).
It is simply set up to proxy requests to OnEarth, and store the successful results, indefinitely. It will only respond to proxy requests originating from the tile server -- so if you want to test things, you'll need to tell me your IP address and I can update the acl.
This cache is large (50Gb) and works very well. It's not tuned to cache any particular tiling -- we'll need to insure the slippy map, java editor, and others, make consistent, rounded requests.
I have experimented with using a maximum precision limit on landsat tiles, and making smaller tiles simple crops of lower resolution imagery. And I've also tried converting arbitrary extents on landsat into a series of crops on OnEarth's recommended caching scheme (this scheme is not yet published by them, but in use at http://onearth.jpl.nasa.gov/WK/). In both cases, the results were poor -- cropped images had very noticeable edges. This must be some JPEG artifact, and maybe there's a solution but I haven't researched that yet. Thoughts?
* Tile Invalidation
Here's what I've thought, but haven't acted on at all. Have cron run a script on tile periodically (every 15 minutes if that's not too intensive). Script grabs all the changes from the db since the last time it has run. For each lat/long pair, it calculates all the tile urls above it to purge (assuming the slippy map and editor tile requests have been harmonized). Tom provided some code to do this in Javascript, just needs to be rewritten in Ruby or whatever. With the total list of invalid urls, the script calls purge [ http://www.wa.apana.org.au/~dean/squidpurge/ ] on each one. Perhaps the source code of purge can be modified to accept a list of urls to purge. This script is a crucial place where someone could make a contribution.
The cache on tile is just 10Gb. That's not currently nearly full, but could approach if the 48 hour expire is dropped. It still should be pretty good anyway -- we'll be left with a cache highly requested and valid tiles, covering the most active areas.
Hope that gives a picture of what we're currently facing with the tiles. Anyone have time to tackle pieces outlined here, or have thoughts or code which does it better, that's totally welcome.
Cheers, Mikel
More information about the dev
mailing list