[OSM-talk] Bounding box

Mon Feb 5 21:17:19 GMT 2007

Hello Raphael

Insane queries are those whose very high demand don't stack up to the close to 
zero value of the result.

Raphaël Jacquot wrote:
> said insane queries may be due in part
None of the following. An insane query is defined above, and is as a result of 
malevolence or ignorance.

> * because the storage driver is not the proper one for the usage we have 
> (for instance, if we're using myisam)
> * because the request needs to read data that is in a separate table, 
> and for some reason locks said table (because of limitations in either 
> the storage driver, the database engine or both)
> * because the app doesn't use transaction facilities if available in the 
> database engine (which in turn would allow it to enable things like 
> "copy on modify" for the current transaction -- that is, the transaction 
> that modifies data sees its version while it's being processed, other 
> read only transaction see the previous version, other write transactions 
> are blocked if they attempt to access this particular data)

Locking is a relatively minor issue. Locking occurs when DB load causes 
otherwise fast queries to take a long time to execute. It is therefore as a 
result of I/O or CPU load. In which case, without locking, the blocked query 
will still be delayed by the same bottleneck.

> 
>> Unfortunately, the mechanism to filter insane queries (which were 
>> perhaps generated through malice or ignorance) catches a few sane 
>> queries, which occasionally will require modified workflow.
>>
>> Once we have upgraded the RAM in the API, we can consider increasing 
>> the node limit. When we partition the database or implement tiles, we 
>> can consider increasing the bbox limit.
> 
> there is no need to use partitionning or other 'hackish' schemes if the 
> indexing engine features the proper index generation algorithms (such as 
> R-tree based indexes)

I don't consider partitioning hackish.

> Of course, you'll then tell me that the index takes a lot of space or 
> something. well, that's what computing is all about, compromises.
> that is, either your data is very compressed and needs a lot of 
> computing power to sift through, or, you sacrifice storage space to have 
> a fast to go through data structure, and your data is fast to access. 
> you can't have both.

I agree to a small part, but mostly disagree. I agree that our indexing scheme 
is sub-optimal. Your assumption that the bbox and node limit requirement is 
substantially as a result of the indexing scheme is incorrect. I also strongly 
disagree with your hidden assumption that an OGC R-tree index as commonly 
implemented is anywhere near optimal. I also disagree with the implication that 
the space inefficiency of the OGC indexing scheme somehow makes the index 
perform better than any other theoretical model or that a different scheme is 
somehow compressed.