[OSM-dev] Status of Database Server after 0.4 Upgrade: Fragile

Sat May 12 21:36:03 BST 2007

So here's a braindump:

HTTPISE 500's. The server is running lighthttpd on top of fcgi using  
the script/process scripts. These are running with 3 threads with a  
restart time of 2 seconds. The threads commit suicide after some  
number of requests (which I will increase in a sec) to either the map  
or gpx call. When that happens, lightttpd and fcgi are supposed to  
pause the queue of requests (which may or may not be api requests).  
My suspicion is that the queue is getting killed (and thus a bunch of  
500's) instead.

Looking through the logs the rails stuff itself is not producing the  
500s, its somwthing to do with the threads killing themselves.

If you have no tech experience you can summarise this on the rails  
list and see if anyone can help.
If you do then you can set up a similar base and test the suicide stuff.

stats. the stats script has various errors and it outputting to the  
wrong directory

database sessions. someone needs to make a migration for storing  
sessions in the database and the appropriate changes in  
environment.rb or whatever

authentication. I have a hunch that we could only do authentication  
on PUT/DELETE with request.get? or something similar in the auth  
code. Then, we could ask clients to do ACCEPT headers which ask for  
text/xml for data from the api. Then, if the header is/isn't present  
you could spit text/html, JSON, rdf...

way tags. most of the api code is fairly fast but the api map call  
does a sql request for every way to get its tags. it shouldn't,  
someone needs to fix.

the to_xml_node code is getting ugly and map-call-specific in the  
node/seg/way models. it should be stripped down and the map call  
should look more like it did in the 0.3 code, doing the xml  
generation itself.

RSS is missing.

very-long-ways. the map call is failing / timeout on bboxes with have  
ways with some large number of segments in them. by large, I mean  
thousands and thousands apparently. Someone needs to (maybe with  
planet.osm) find how big these ways are and we need to think what to  
do. Should the API not return the data and give up on too many  
segments? Should we split the ways up?

GPX. the import code using rails daemons works but it looks like it  
might stomp on the log file it shares with the fcgi processes that  
run. Thats's what it looked like the other day when I was working on  
it. Someone needs to test this using lighttpd and the script/process  
stuff. The code needs to be split out in to the trace.rb model so I  
can do Trace.find(:all, :conditions => ['inserted = ?', false]).each  
{|t| t.import } or something. That will make re-importing the backlog  
of old GPX files easier. Someone needs to write the code to do that  
back importing too. It should do something like 'delete from  
gpx_points where id = n' where n is the id of the trace in case there  
are old points in the db before importing too.

someone can summarise all the above and put them in to tickets or  
something.

The rails port has tripled, or something, the number of server  
developers. I've been dropping responsibilities all over the place  
like handing over tile yesterday but I'm still backed up despite  
sitting at my macbook all day long. Unlike in the past you can't rely  
on me to fix everything, and my responsibilities span a larger amount  
of tasks than just the code like it did not so long ago. So, as  
always, really feel free to take responsibility for something and run  
with it.

Oh and someone should announce OSM as a rails app to the rails list  
and try and attract developers from there. Advocacy at your friendly  
local linux or ruby group may also help.

On 12 May 2007, at 05:12, Frederik Ramm wrote:
> Hi,
>
>     I am still frequently seeing "500 internal server error"s when
> uploading stuff, and also when I try to render a tile with the t at h
> client, in 5% to 10% of cases the server just won't reply at all  
> (timeout).
>
> I didn't make a big fuss about this, believing that a lot of  
> "behind the
> scenes" work is going on, that these problems must be well known and
> being worked on; but today I took a look at trac and the tickets I see
> there are a lot of little things that are broken - but no  
> indication of
> the fact that we're currently still in a sort of maintenance mode  
> with a
> noticeable proportion of calls to the database server simply not  
> working.
>
> Put in other words, our trac looks like this:
>
> +--------------------------------+
> | White Star Line                |
> | HMS Titanic                    |
> | To-Do List                     |
> |                                |
> | 1. Doorknob loose in cabin #23 |
> | 2. Toilet not working in #55   |
> | 3. ...                         |
> +--------------------------------+
>
> This is not a complaint - I am just a bit unsure, because there's not
> that much information on this mailing list about what people are  
> working
> on with the server, and until now I believed they're working on fixing
> the obvious problems, and if this is true then just go on and  
> ignore me.
>
> I am just writing this on the slim chance that everybode except me
> believes everything is working fine, and later I am told "why  
> didn't you
> raise the issue when you had all those errors...?"
>
> I'll open a ticket for the internal server errors, just to be on the
> safe side.
>
> (If help is wanted identifying the problems, I am ready to try my luck
> any time, but I'd need access to the db server including server- 
> restart,
> tcpdump, and strace privileges, and I can understand if you  
> hesitate to
> hand these out to just anyone. I have already tried to reproduce the
> problem using a locally-installed rails server but that, being neither
> under much load nor having too much data, doesn't cough up.)
>
> Bye
> Frederik
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>

have fun,

SteveC | steve at asklater.com | http://www.asklater.com/steve/