[OSM-dev] Anyone with a speedy gazetteer

Mon Jan 12 12:09:03 GMT 2009

While what Tom says may help performance, I don't think it is why things 
are running disparately slowly at the moment.

It is mainly extremely slow at present because it has got into a state 
where daily updates are interfering with each other and I haven't had a 
chance to look at why this is happening. It is getting worse, because 
each day slows down more so the next day starts up and interferes with 
that. I know it needs attention. While the updates are running, searches 
are much slower because they don't get much opportunity to run. I 
suspect his vicious circle started because there was one day when there 
was an exceptionally large update and this ran fro > 24 hours, running 
into the next.

Secondly, I introduced a bug whereby it is not being as selective as it 
should be with qualified searches. This means some searches are doing 
more work than they need to.
I am happy to try InnoDB, and will do this when I reload the database, 
which I had hoped to do over Christmas, but didn't get to. But from the 
I doubt it will make any significant difference. It would if there were 
locking going on, but there isn't. The delay is mainly that I didn't 
want to just reload it in its present form, but to change things to 
support more flexible and more international searches, and a better 
algorithm for avoiding duplicates, all of which needs time to code.

PHP is neither here nor there, and 'pile' is pejorative. Ruby is 
generally slower from what I have read. But most of the time is not in 
the application but in the queries. Recoding in C(++) might make things 
faster, especially the update (though doing one without the other means 
duplicating large amounts of code) but that is like starting from 
scratch, and I'm not convinced that it would give orders of magnitude 
increases - I think the biggest drain is the MySQL indexing. Delete 
operations seem especially slow. I have no control over this. 
Nevertheless, C is an option worth exploring in the longer term.

There may be some mileage in alternating, one database doing updates 
while the other is providing search. But this actually nearly doubles 
the update load. Doing it all on one machine (or one MySQL server, 
though surely it is threaded) may not therefore be productive. There is 
no advantage in using one database to compute the updates and then 
applying SQL to the other, because this still puts the load on the

I have no idea how to implement a query time out in MySQL. Even if the 
PHP dies and the connection is dropped, the load seems to persist in the 
server. Any pointers that would help me do this welcome though as I say, 
the searches are being impeded at the moment rather than being 
intrinsically extremely slow.

Finally, it is never going to be as fast as Google, because they can 
afford to throw massive amounts of networked resources at the problem, 
and massive amounts of paid staff time. We have one person and one 
computer at the moment.

David

On 12/01/2009 11:04, Tom Hughes wrote:
> Erik Johansson wrote:
> 
>> I have two questions; why is gazetteer.openstreetmap.org so slow  at
>> 20-50 seconds per request, and if anyone has code for faster variants
>> of name finders? Speed is essential..
> 
> Because it's a pile of PHP code that is hardcoded to use MySQL as the 
> backend and which uses MyISAM tables in an environment to which they are 
> not at all suited.
> 
> As I have explained a number of times, in order for the current 
> namefinder technology to have any chance of working properly the 
> following things need to happen:
> 
>    - All tables need to be switched to InnoDB
> 
>    - A query timeout needs to be implemented to abort long
>      running queries
> 
> Switching to PostgreSQL would probably be even better.
> 
> Once the above things are done we will be able to see whether the 
> current schema/technology are actually workable or not.
> 
> Tom
>