[OSM-dev] Extracting info from XML open street map file

mick bareman at tpg.com.au
Wed Feb 3 01:11:42 UTC 2016

On Fri, 29 Jan 2016 09:20:30 +0100
Jochen Topf <jochen at remote.org> wrote:

> On Mi, Jan 27, 2016 at 03:06:25 +0000, mick wrote:
> > On Tue, 26 Jan 2016 15:52:03 +0100
> > Jochen Topf <jochen at remote.org> wrote:
> >   
> > > On Di, Jan 26, 2016 at 01:10:06 +0000, mick wrote:  
> > > > I've been struggling for a few years, on and off to extract useable subsets
> > > > from open streetmap files with very limited success. osm2pgsql produces the
> > > > best results but depends on knowing all the keys in the input file.    
> > > 
> > > You might be looking for hstore:
> > > http://wiki.openstreetmap.org/wiki/Osm2pgsql#hstore
> > > 
> > > Then again, this might not do what you want. Depends on what you mean with
> > > "useable subsets from open streetmap". Maybe you can explain some more about
> > > what you are trying to achieve in the end.
> > > 
> > > Jochen
> > > -- 
> > > Jochen Topf  jochen at remote.org  http://www.jochentopf.com/  +49-351-31778688
> > >   
> > The final result I'm after is a set of themed MapInfo layers (coastline, waterways, roman roads, roman settlements, etc.). The reason I'm extracting tags is to create a complete list of tags to create a comprehensive osm2pgsql .style file store the refined data in a postgis database.
> > 
> > Due to limitations in MapInfo a record is limited to 4000 characters and a field to 254 characters. Using hstore fails due to truncation.  
> There are over 57,000 keys and over 70 million distinct tags in the OSM
> database, there is no way you can all bring them into MapInfo layers.

Using the Great Britain 'dump' from geofabrik I found 2.1 million key/value pairs and filtered that down to ~9,000 unique keys, of those I was interested in about 20, which required selecting about 120 to cover the multitude of spelling and punctuation variations. From there I wrote a C program (I'm getting to old to get my head around all these new-fangled object-oriented scripting languages) to read the .osm file and write a osm2pgsql .style file.

Next step was run osm2pgsql to create a db then load it into qgis. so far so good, now try to export from qgis and come to a brick wall - an update of a dependancy in archlinux has caused it to segfault when I try I click 'Save As' so I have to track that down. Meanwhile I try 'ogr2ogr' but loose all of England north of the Thames.

> You can bring it all into osm2pgsql using hstore, but any of the classical GIS 
> formats such as Shapefiles or MapInfo are just not made for this kind of data.
> In addition there is lots of noise in the data, lots of different tags that
> should really be the same etc.
About 20%

> You have to decide what you are interested in first,
That was the purpose of the first stage of this exercise, I needed to find the keys that described the data I needed.

> then set up some kind of data conversion pipeline that reads OSM data and spits
> out cleaned up data in the format you want. There are several ways to do this
> and going through osm2gsql with hstore is not the worst.
Time to start learning postgres SQL and Mapinfo Interchange Format.

If there is anyone interested in colaborating give me a mail

Many thanks for your help


More information about the dev mailing list