[OSM-dev] [OSM-talk] donating read-only api-mirrors
Stefan de Konink
stefan at konink.de
Sat Feb 7 04:56:56 GMT 2009
Matt Amos wrote:
> well, i find join-style syntax easier, but essentially yes.
Finally we found a mutual understanding ;) I am happy we didn't got as
low as mathematics ;)
>> But for performance reason Matthias would like to split this process in two
>> distinct parts, so reuse of the resultset is possible?
>>
>> nodes := SELECT id FROM nodes WHERE BBOX(...)
>> ways := SELECT way FROM way_nds WHERE node IN (nodes);
>>
>> SELECT * FROM nodes WHERE IN(nodes) OR IN(
>> SELECT node FROM way_nds WHERE way IN (ways);
>> )
>
> indeed, with this method either the node and way IDs are cached (i.e:
> stored in memory) or the query is repeated (wasting time).
Some databases cache their results or even complete queries when noting
is modifies. Why would you like to outperform [insert database brand
here] with futile [insert language here]?
>> And while busy...
>>
>> SELECT * FROM ways WHERE id IN (ways);
>
> while simultaneously doing "select * from way_tags where id in (ways)"
> and "select * from way_nodes where id in (ways)" and merging them for
> output ;-)
You have this dbslayer setup that actually can do things at the same
time ;) I agree nifty ;) Otherwise lets wait until your connection is
served ;)
>> As pointed out before, I am nowhere near a MySQL expert; but is MySQL able
>> to make soup from those partial results (vulcano wise it has a caching
>> advantage here), and might be smart enough to reuse the results.
>
> i wouldn't count on it. mysql's query optimiser is frequently wrong.
Frequently means that it can be right too; did someone test that subsets
are cached within queries or not?
>> I have
>> benchmarked using my database engine that there is a tipping point in
>> reusing materialized resultsets (integers) versus rejoining them. If reusing
>> resultsets in MySQL is always cheap, no matter how big the query that might
>> be a good approach.
>
> there is probably a similar effect in mysql. but i am also not an
> expert, so i don't know the necessary voodoo incantations to get the
> best out of mysql ;-)
Trivial; I have created a small Python (eeew!) program that creates me a
list of random bbox'ed queries this is piped to a database client
results are grouped and counted and the time is saved. Now we can make a
nifty 3D plot with the size of the resultset, the time we spent on the
query and the area that we searched for nodes.
Now do this for any query method you want to bench and do something with
gnuplot ;) And see which query gives you the best overal performance. If
there are significant things to see in your output, you might partition
your query type based on the parameters, in this case bbox.
>> I frankly don't want to make problems bigger than they are, is the only
>> error message currently/0.6 http headers? I am sure the parser will get the
>> message if we just send broken XML back ;)
>
> currently all errors are indicated by HTTP status codes. i'm sure
> you'd agree that sending back broken content != indicating an error.
> in HTTP-based protocols its a good idea to be standards compliant and
> use the error reporting mechanisms that all clients (including wget,
> curl, etc...) will understand.
Maybe the API is HTTP based, the XML output is certainly part of another
language. I think that the amount of trouble streaming output saves
hinting an error in XML is minor sacrifice, especially when you relate
this to the final document validness and the chance it is suddenly
hitting an error when there already is output.
Stefan
More information about the dev
mailing list