[OSM-dev] Status of Database Server after 0.4 Upgrade: Fragile
SteveC
steve at asklater.com
Sat May 12 21:36:03 BST 2007
So here's a braindump:
HTTPISE 500's. The server is running lighthttpd on top of fcgi using
the script/process scripts. These are running with 3 threads with a
restart time of 2 seconds. The threads commit suicide after some
number of requests (which I will increase in a sec) to either the map
or gpx call. When that happens, lightttpd and fcgi are supposed to
pause the queue of requests (which may or may not be api requests).
My suspicion is that the queue is getting killed (and thus a bunch of
500's) instead.
Looking through the logs the rails stuff itself is not producing the
500s, its somwthing to do with the threads killing themselves.
If you have no tech experience you can summarise this on the rails
list and see if anyone can help.
If you do then you can set up a similar base and test the suicide stuff.
stats. the stats script has various errors and it outputting to the
wrong directory
database sessions. someone needs to make a migration for storing
sessions in the database and the appropriate changes in
environment.rb or whatever
authentication. I have a hunch that we could only do authentication
on PUT/DELETE with request.get? or something similar in the auth
code. Then, we could ask clients to do ACCEPT headers which ask for
text/xml for data from the api. Then, if the header is/isn't present
you could spit text/html, JSON, rdf...
way tags. most of the api code is fairly fast but the api map call
does a sql request for every way to get its tags. it shouldn't,
someone needs to fix.
the to_xml_node code is getting ugly and map-call-specific in the
node/seg/way models. it should be stripped down and the map call
should look more like it did in the 0.3 code, doing the xml
generation itself.
RSS is missing.
very-long-ways. the map call is failing / timeout on bboxes with have
ways with some large number of segments in them. by large, I mean
thousands and thousands apparently. Someone needs to (maybe with
planet.osm) find how big these ways are and we need to think what to
do. Should the API not return the data and give up on too many
segments? Should we split the ways up?
GPX. the import code using rails daemons works but it looks like it
might stomp on the log file it shares with the fcgi processes that
run. Thats's what it looked like the other day when I was working on
it. Someone needs to test this using lighttpd and the script/process
stuff. The code needs to be split out in to the trace.rb model so I
can do Trace.find(:all, :conditions => ['inserted = ?', false]).each
{|t| t.import } or something. That will make re-importing the backlog
of old GPX files easier. Someone needs to write the code to do that
back importing too. It should do something like 'delete from
gpx_points where id = n' where n is the id of the trace in case there
are old points in the db before importing too.
someone can summarise all the above and put them in to tickets or
something.
The rails port has tripled, or something, the number of server
developers. I've been dropping responsibilities all over the place
like handing over tile yesterday but I'm still backed up despite
sitting at my macbook all day long. Unlike in the past you can't rely
on me to fix everything, and my responsibilities span a larger amount
of tasks than just the code like it did not so long ago. So, as
always, really feel free to take responsibility for something and run
with it.
Oh and someone should announce OSM as a rails app to the rails list
and try and attract developers from there. Advocacy at your friendly
local linux or ruby group may also help.
On 12 May 2007, at 05:12, Frederik Ramm wrote:
> Hi,
>
> I am still frequently seeing "500 internal server error"s when
> uploading stuff, and also when I try to render a tile with the t at h
> client, in 5% to 10% of cases the server just won't reply at all
> (timeout).
>
> I didn't make a big fuss about this, believing that a lot of
> "behind the
> scenes" work is going on, that these problems must be well known and
> being worked on; but today I took a look at trac and the tickets I see
> there are a lot of little things that are broken - but no
> indication of
> the fact that we're currently still in a sort of maintenance mode
> with a
> noticeable proportion of calls to the database server simply not
> working.
>
> Put in other words, our trac looks like this:
>
> +--------------------------------+
> | White Star Line |
> | HMS Titanic |
> | To-Do List |
> | |
> | 1. Doorknob loose in cabin #23 |
> | 2. Toilet not working in #55 |
> | 3. ... |
> +--------------------------------+
>
> This is not a complaint - I am just a bit unsure, because there's not
> that much information on this mailing list about what people are
> working
> on with the server, and until now I believed they're working on fixing
> the obvious problems, and if this is true then just go on and
> ignore me.
>
> I am just writing this on the slim chance that everybode except me
> believes everything is working fine, and later I am told "why
> didn't you
> raise the issue when you had all those errors...?"
>
> I'll open a ticket for the internal server errors, just to be on the
> safe side.
>
> (If help is wanted identifying the problems, I am ready to try my luck
> any time, but I'd need access to the db server including server-
> restart,
> tcpdump, and strace privileges, and I can understand if you
> hesitate to
> hand these out to just anyone. I have already tried to reproduce the
> problem using a locally-installed rails server but that, being neither
> under much load nor having too much data, doesn't cough up.)
>
> Bye
> Frederik
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
have fun,
SteveC | steve at asklater.com | http://www.asklater.com/steve/
More information about the dev
mailing list