[OSM-dev] [OSM-talk] donating read-only api-mirrors

Sat Feb 7 04:56:56 GMT 2009

Matt Amos wrote:
> well, i find join-style syntax easier, but essentially yes.

Finally we found a mutual understanding ;) I am happy we didn't got as 
low as mathematics ;)

>> But for performance reason Matthias would like to split this process in two
>> distinct parts, so reuse of the resultset is possible?
>>
>> nodes     := SELECT id FROM nodes WHERE BBOX(...)
>> ways      := SELECT way FROM way_nds WHERE node IN (nodes);
>>
>> SELECT * FROM nodes WHERE IN(nodes) OR IN(
>>  SELECT node FROM way_nds WHERE way IN (ways);
>> )
> 
> indeed, with this method either the node and way IDs are cached (i.e:
> stored in memory) or the query is repeated (wasting time).

Some databases cache their results or even complete queries when noting 
is modifies. Why would you like to outperform [insert database brand 
here] with futile [insert language here]?

>> And while busy...
>>
>> SELECT * FROM ways WHERE id IN (ways);
> 
> while simultaneously doing "select * from way_tags where id in (ways)"
> and "select * from way_nodes where id in (ways)" and merging them for
> output ;-)

You have this dbslayer setup that actually can do things at the same 
time ;) I agree nifty ;) Otherwise lets wait until your connection is 
served ;)

>> As pointed out before, I am nowhere near a MySQL expert; but is MySQL able
>> to make soup from those partial results (vulcano wise it has a caching
>> advantage here), and might be smart enough to reuse the results.
> 
> i wouldn't count on it. mysql's query optimiser is frequently wrong.

Frequently means that it can be right too; did someone test that subsets 
are cached within queries or not?

>> I have
>> benchmarked using my database engine that there is a tipping point in
>> reusing materialized resultsets (integers) versus rejoining them. If reusing
>> resultsets in MySQL is always cheap, no matter how big the query that might
>> be a good approach.
> 
> there is probably a similar effect in mysql. but i am also not an
> expert, so i don't know the necessary voodoo incantations to get the
> best out of mysql ;-)

Trivial; I have created a small Python (eeew!) program that creates me a 
list of random bbox'ed queries this is piped to a database client 
results are grouped and counted and the time is saved. Now we can make a 
nifty 3D plot with the size of the resultset, the time we spent on the 
query and the area that we searched for nodes.

Now do this for any query method you want to bench and do something with 
gnuplot ;) And see which query gives you the best overal performance. If 
there are significant things to see in your output, you might partition 
your query type based on the parameters, in this case bbox.

>> I frankly don't want to make problems bigger than they are, is the only
>> error message currently/0.6 http headers? I am sure the parser will get the
>> message if we just send broken XML back ;)
> 
> currently all errors are indicated by HTTP status codes. i'm sure
> you'd agree that sending back broken content != indicating an error.
> in HTTP-based protocols its a good idea to be standards compliant and
> use the error reporting mechanisms that all clients (including wget,
> curl, etc...) will understand.

Maybe the API is HTTP based, the XML output is certainly part of another 
language. I think that the amount of trouble streaming output saves 
hinting an error in XML is minor sacrifice, especially when you relate 
this to the final document validness and the chance it is suddenly 
hitting an error when there already is output.

Stefan