[OSM-dev] Bulk batch address search

Sarah Hoffmann lonvia at denofr.de
Fri Jun 15 20:38:43 UTC 2018


Hi,

On Tue, Jun 12, 2018 at 12:04:46AM +0200, Julien Cochennec wrote:
> Hi,
> I work for a big stats institute that have millions of addresses stored in
> a Oracle database.
> Data interacts with a SQL/JAVA search engine that is almost impossible to
> port.
> We can't afford to pay this system anymore and only have a few months,
> maybe more than a year, to switch to a different system.
> Our software takes addresses in big files from external providers, add
> geocoding data and stats to each address and return the extended data to
> providers as bigger files.
> 
> We need to switch to PostGreSQL, so I was thinking about :
> - turning our adresses data into OSM format
> - turning our non geo data (administrative confidential data) in tags
> related to geo addresses data
> - putting all this on our own nominatim instance server with only french
> addresses

I'm not sure that is directly possible. Nominatim is specifically made
to process OSM data. That means it computes addresses out of admin
boundaries, streets and housenumber points. It cannot handle already
processed address data because it expects that there are also an objects
for all parts that appear in the address (street, city, county ...).

If you have just a database of addresses, you might be better off looking
into an elastic search based solution like photon or Pelias.

For photon you need to write a custom import script but that should not be
too difficult.  It expects a pretty well structured record. The downside is
that you will have to go through the web API and I cannot comment on the search
performance of such a solution.

I don't know much about Pelias but it was made with support for external
datasets in mind.

> - developing a web interface based on existing OSM tools
> - developing scripts that would make the match evaluation between provider
> address and nominatim address database
> 
> So I need to know if it's possible to make millions of search in a bulk
> process, via nominatim, in command line, from a big input file (let's say
> csv) in a few hours, less than a whole night, searching through only french
> addresses. And how do I do that? I saw things about GeoPy but I don't want
> to slow the process with web API, just terminal.

Nominatim comes with a PHP script where you can interface search directly
without the web API. The output is the same, so the script and the web API
can be used interchangably when you have your own local installation. If
you don't mind PHP, you can also use the script as a blueprint to write
your own export script that puts the data directly whereever you need it.
That would probably the fastest.

As for number of requests that really depends on the hardware you have and
what kind of data you process. The world-wide instance on nominatim.osm.org
peaks somewhere around 250 requests/s but thats a pretty beefy machine, see
https://hardware.openstreetmap.org/servers/dulcy.openstreetmap.org/
If you only want to process France, you might be able to set up a machine
that can hold most of the data in RAM and get more out of it. You should
be aware though that PostgresSQL can be a beast and it might take quite a
bit of trail-and-error to get it configured right to handle that amount of
requests.

> I guess there are less than 100 millions of addresses in our database. But
> providers sometime give 3millions addresses in a file.

3 millions in a couple of hours is certainly doable. 

Kind regards

Sarah

> 
> It would be a win/win as we could become a great contributor to OSM having
> all our data in OSM format and also use almost all tools OSM has already
> provided. We already give some info like city administrative borders/shapes
> via OpenData program.
> 
> Thanks all for your help.

> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/dev




More information about the dev mailing list