danielsabo at gmail.com
Thu Feb 17 01:57:58 GMT 2011
There has actually been a python script to generate a simple sqlite database form osm xml for years, but transforming the data into lines and polygons really is the hard part.
I did find a version of spatialite that supported build area after I had written the multipolygon code, but I've decided to keep the code in osm2spatialite because it does a much better job of generating valid multipolygons that BuildArea can (because BuildArea doesn't have a concept of the inner and outer roles). In the future there's room for osm2spatialite to also fix self intersecting polygons but since they don't upset mapnik I haven't gotten around to it.
The memory usage is because generating the osm2pgsql style database requires keeping all the geometry data in ram to generate the lines and polygons. I do have code that caches it and keeps memory useage < 100M but it's significantly slower for data that fits in ram. If osm2pgsql is any indication it's probably faster to just fill up swap space but I haven't had the time to let it run against any multi GB files.
The other reason I expect to stick with python for this is that it makes it much easier to prototype preprocessing code, and at some point I'd like to support things like the label & admin_center roles on boundary multipolygons.
On Feb 16, 2011, at 5:37 AM, Axel Kollmorgen wrote:
> On 2011-01-14 13:33, Daniel Sabo wrote:
>> I have a beta version of an osm2spatialite script written in python.
>> Neither the GOES C api and SpatialLite have a version of BuildArea
>> so most of the code is devoted to assembling multipolygons, so
>> that's the area it's most likely to produce different results than
> SpatiaLite has BuildArea() support since 2010-04-01 (svn) / 2010-08-15
> (2.4.0 RC3) [1,2]. also, spatialite-tools include the spatialite_osm_*
> tools, among them spatialite_osm_raw, which "simply acquir[es] OSM XML into DB tables (fully preserving the XML layout)" . the schema generated by spatialite_osm_raw is different from the osm2pgsql schema, but all the important ingredients (raw data for nodes, ways, relations; tags; join tables for ways and relations; geometry for nodes) are there. it wouldn't require too much effort to transform this into the osm2pgsql schema, either by postprocessing it with sql, or (probably the better idea) by adapting spatialite_osm_raw.c. we might even suggest a spatialite_osm_pgsql tool to Alessandro "Sandro" Furieri, the developer of SpatiaLite - he is known for magically producing code for many feature requests.
>> Ram is an issue so I'm not sure python is the best choice in the long
>> term. Right now a 450MB xml file takes ~900MB to process. It is
>> fairly speedily though as long as you don't run out of memory.
> running spatialite_osm_raw on a 368MB osm file takes 76 secs and less than 30MB RAM (on a virtual debian server with 1GB RAM). this will take more if we want to have osm2pgsql's geometry tables for lines, roads, and polygons also. i still find this quite impressing, and i'm sure it is way more efficient than doing the processing in python.
>  http://groups.google.com/group/spatialite-users/browse_thread/thread/78b880074f1d84a0
>  http://www.gaia-gis.it/spatialite-2.4.0-4/spatialite-sql-2.4-4.html#p0
>  http://www.gaia-gis.it/spatialite-2.4.0-4/road_map.html
> dev mailing list
> dev at openstreetmap.org
More information about the dev