[OSM-dev] Anyone with a speedy gazetteer

Sascha Silbe sascha-ml-gis-osm-dev at silbe.org
Mon Jan 12 11:45:12 GMT 2009


On Mon, Jan 12, 2009 at 10:57:24AM +0100, Erik Johansson wrote:

> I have two questions; why is gazetteer.openstreetmap.org so slow  at
> 20-50 seconds per request,
I remember David Earl mentioning a scheduled database rebuild. Perhaps 
its currently running. Otherwise, I'd really like to know the reasons 
(so I can test my own implementation accordingly).

> and if anyone has code for faster variants of name finders? Speed is 
> essential..
I have done an alternative, but suspended its development because
a) the current Namefinder was sufficiently fast when I tried to compare 
them => don't know under which conditions it's slow, so I cannot test 
whether my implementation really is faster in these cases; and
b) because I haven't had enough time to finish it.

The core (written in C/C++) is almost finished (only some minor changes 
for name canonicalization needed). What's really missing is the 
"front-end" part, i.e. the web server and search plan generation. A 
Python module implementing the protocol between Core and Frontend is 
included, so someone else could start implementing it right away. :)


To give some funny, unscientific numbers:

General conditions: planet-080813.osm.gz, Athlon 64 BE-2300 dual-core 
1.9Ghz, 4GB DDR2-800, Limit 100 ways + 100 nodes per step

Data base size: ~8GB
import time: 206 minutes
startup time: 41 seconds
RAM usage after startup (-> names and hash tables): 501MB

name search ("Provenceweg", exact):            0.001117s
loc search (48.51 9.07 48.52 9.08):            0.061777s
exact name + loc:                              0.001331s
loc + exact name:                              0.005612s
name search ("Provenceweg", regex):            2.582227s
loc + regex:                                   0.232248s
exact name + loc (whole world):                2.263841s
loc (whole world):                             0.069157s
Name search ("Provenceweg", substring):        0.600431s


Unfair comparison with current namefinder (my code: local, no frontend; 
gazetteer: remote; with caching, i.e. first result is discarded):

					old		new
Beethovenweg				0.681s		0.008970s
random string (=> 0 results)   		0.310s		0.000352s



I've released my code under GPLv2 in my arch repository [1] as 
osmsearch--devel--0.1 (haven't had an idea for a better name yet, 
sorry).
You'll need some data files [2-4] as well.

Instructions for fetching the source (assuming the GNU arch client "tla" 
is installed):

1. tla register-archive 
http://sascha.silbe.org/arch/sascha-arch@silbe.org--2008
2. tla get sascha-arch at silbe.org--2008/osmsearch--devel--0.1 
osmsearch--devel--0.1

This will put the source into a newly-created directory called 
"osmsearch--devel--0.1".


[1] http://sascha.silbe.org/arch/sascha-arch@silbe.org--2008
[2] http://sascha.silbe.org/tmp/osmsearch.conf
[3] http://sascha.silbe.org/tmp/nodesDisplay.rules
[4] http://sascha.silbe.org/tmp/waysDisplay.rules

CU Sascha

-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20090112/3ef322ed/attachment.pgp>


More information about the dev mailing list