[OSM-dev] Osmosis import to postgis - speed

Brett Henderson brett at bretth.com
Sat Dec 6 23:37:58 GMT 2008


The wiki hasn't been updated for the 0.6 tasks but there are new options 
on some of the tasks that will help here.

The pgsql_simple_schema_0.6.sql no longer includes a bbox or linestring 
column by default.  But there are two other scripts that add these columns:
pgsql_simple_schema_0.6_bbox.sql
pgsql_simple_schema_0.6_linestring.sql

Osmosis is smart enough to query the database schema to check which 
columns need to be populated for all tasks that interact directly with 
the database.

The next thing to note is that for large imports the --write-pgsql-dump 
task is far more appropriate.  It will generate dump files that can be 
loaded into the database via COPY statements.
The following script performs the steps necessary to load a set of dump 
files:
pgsql_simple_load_0.6.sql

Now for the slightly more complicated bit.  You have three options 
regarding the bbox and linestring columns.
1. If you don't need them, don't create them.  Your life will be 
simple.  Comment out the steps in the pgsql_simple_load_0.6.sql that 
refer to these columns.
2. Create those columns, and let pgsql_simple_load_0.6.sql populate them 
after the import is complete.  This may take a huge amount of time for 
large databases.
3. The --write-pgsql-dump task now has two options 
inMemoryLinestring=yes, and inMemoryBbox=yes which will cause the task 
to calculate values for those columns within osmosis.  They're named 
badly because it isn't actually in memory and uses a big temporary 
file.  This is slow, but much faster than doing it in pgsql.  If you 
need one or both of these columns add these options.  You will need to 
modify pgsql_simple_load_0.6.sql to comment out the update statements 
populating the linestring and bbox columns.

It's also worth noting that default values for postgres aren't very 
optimised.  I'm no expert in this space but large performance gains are 
possible if you increase the resources available to the database server.

So, to summarise.  Tune postgres if possible.  Don't use the 
--write-pgsql task, use --write-pgsql-dump instead.  If you don't need 
the bbox or linestring columns, don't use them.  If you do need one of them:
1. Add the relevant option to the --write-pgsql-dump task.
2. Modify the load script and comment out all update statements.
3. Modify the load script and comment out all statements referring to 
the column you don't need.

Hope that helps.  I need to release a new osmosis and update the wiki, 
but I don't have much time on my hands at the moment.  If anybody wishes 
to volunteer please let me know.

Brett


S Knox wrote:
> Thanks,
>
> I hadn't seen that. Whoops!
>
> ------------------------------------------------------------------------
> *From:* Karl Newman <siliconfiend at gmail.com>
> *To:* S Knox <roxyknox at yahoo.co..uk>
> *Cc:* dev at openstreetmap.org
> *Sent:* Saturday, 6 December, 2008 0:37:10
> *Subject:* Re: [OSM-dev] Osmosis import to postgis - speed
>
> On Fri, Dec 5, 2008 at 4:11 PM, S Knox <roxyknox at yahoo.co.uk 
> <mailto:roxyknox at yahoo.co.uk>> wrote:
>
>
>     I'm using Osmosis to import a recent UK extract of the planet file
>     into a database, and I used the simple schema here:
>     http://svn.openstreetmap.org/applications/utils/osmosis/script/pgsql_simple_schema_0.6.sql
>
>     This took around 1 hour.
>
>     I then found that this schema didn't give way geometries so re-ran
>     the query, adding a geometry column 'linestring' so that way
>     objects would be created.
>
>     This import has taken 24 hours so far and isn't yet finished. Is
>     this normal or has something gone wrong? It would be useful to
>     have some sort of counter to find out how far into the process
>     osmosis is - for example roughly what line of the XML it is on, as
>     a percentage of the whole file.
>
>
> Try this: 
> http://wiki.openstreetmap.org/wiki/Osmosis/DetailedUsage#--log-progress_.28--lp.29 
> Osmosis works on streams, so it can't know what percentage of the 
> total is complete.
>
> You can use an external bzip to decompress the file to stdout and then 
> pipe it to Osmosis.. Type a hyphen as the filename for Osmosis to 
> indicate it should read from stdin.
>
> Karl
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>   





More information about the dev mailing list