[OSM-dev] query to the database matching ids without certain key

Wed Aug 12 09:33:46 BST 2009

2009/8/9 Stephan Knauss <osm at stephans-server.de>:
> Jon Burgess wrote:
>> On Sun, 2009-08-09 at 21:27 +0200, Stephan Knauss wrote:
>>> I want to get for example all airport nodes that miss a name.
>
> I think I figured it out. There is a keyword "EXCEPT" that was unknown
> to me. I think it's providing the functionality I was looking for.
>
> Can someone query the whole planet? I currently only have an excerpt.
> The result should be the same 454 nodes Jon got out of the mapnik table.
>
> select airports.node_id from
>   (select node_id from node_tags where v='aerodrome') as airports except
>   (select distinct name.node_id from node_tags as name, node_tags as a
>        where a.node_id=name.node_id and a.v='aerodrome' and (name.k='name'))

A "better" query would be

select airports.node_id from node_tags as airports left join (select *
from node_tags where k='name') as name using (node_id) where
airports.v='aerodrome' and name.k is null and airports.k='aeroway';

why is this better?

1. node_tags only needs search twice where as your query needs to
search node_tags 3 times, node_tags is a very very large table and
hence its going to take a while.
2. the use of airports.k='aeroway' removes a distinct which is also
slow (as the database needs to sort and remove duplicates) also any
indexes are more likely to be used.
3. Use of join rather than except/is in is usually faster as it gives
the database more options to reorder its searches to optimize the
search.

My rather old database of just the UK says this query is about twice
the speed. (However it *may* give different results) but they are
probably both what you want. (Adding extra indexes may speed this up
further but these take time to add...)

Peter.