[OSM-dev] renderd questions....

Tue Jan 11 21:59:54 GMT 2011

On Tue, 2011-01-11 at 13:42 -0600, Samir Faci (Dev) wrote:
> I've been using renderd / render_list to generate tiles as needed.
> 
> I usually issue one of two version of the command, based on need.
> 
> 
> render_list -v --all -n 15  --socket=/var/run/renderd/renderd.sock
> --min-zoom=0 --max-zoom=9   ## to generate all tiles.
> 
> cat expired_list | render_expired -v -n 15
> --socket=/var/run/renderd/renderd.sock  --min-zoom=10 --max-zoom=13

"-n 15" sounds a little high, have you tried reducing this to see if the
system becomes more stable? You may be hitting some resource limit by
trying to render 15 requests in parallel.

> lately it seems like I've been stuck on a expire_list, but while
> debugging I've noticed occasionally stderr messages like the ones
> listed below:
> 
> 
> process: x=1272 y=2952 z=13
> socket connect failed for: /var/run/renderd/renderd.sock
> render: /tiles/default/13/0/0/75/248/136.meta

> Usually I get those connection errors when the renderd services dies
> for some unknown reason, or the postgresdb has problems.

That is unusual. The renderd process on the main OSM server runs for
months at a time without any crashes. 

> I can most likely handle or fix the issue through a service
> restart.... though I was wondering if there was a clean way to monitor
> these errors.
> 
> Can I start renderd with additional logging somehow?  Or make renderd
> error out if it sees this sort of errors?

The main thing to do would be to capture a core dump or run renderd with
GDB attach to catch any errors. This should identify which code is
causing the crash which is the underlying reason for the connect errors.

The only instability problem I know of is a problem with some versions
of GDAL which make multi-threaded rendering with the Mapnik gdal plugin
are likely to trigger crashes. Does your style make use of any GDAL data
sources (the default osm.xml does not)?

The other possibility is some resource related problem which only
triggers when lots of data-heavy tiles are rendered in parallel. This
could be a memory issue, database connection limit or something else.

Another possibility is that you might need to increase the mapnik
postgis connection pool size to more than the number of threads you are
using:

http://trac.mapnik.org/wiki/XMLConfigReference
"Additional parameters for type postgis ... max_size (default 10)"

> Also.. if the connection to socket failed, would the tile request be
> re-issued, or will it simple be ignored?

This particular error message can only occur when the render_expired
threads are starting up. It means that one of the "n" threads was not
able to connect and therefore doesn't get used to submit the requests to
rendered, the remaining threads will handle all requests without any
being lost. 

It actually surprises me that you would ever get these connect failures
in normal operation unless you happened to exceed the MAX_CONNECTIONS
value (default of 2048 in the current code). Could you be running
several different instances of render_expired in parallel, e.g. spawning
a new one for each minute diff without waiting for the last one to
complete?

> I basically want to know if these types of messages are problematic,
> if so how to catch the event and address them or at the very least
> notify me when they do occur.

The renderd process should never crash. I would like to fix any crash
which is reproducible so long as it is within the renderd or mapnik
code. If however you happy with it crashing periodically then there are
several service monitoring tools which will automatically restart failed
processes, e.g. 'init' with the restart option. You need to investigate
which options are available for your OS (you don't tell us what you are
using).

    Jon