[OSM-dev] Patch for osm2pgsql "duplicate keys"

Andrew M. Bishop amb at gedanken.demon.co.uk
Sun Dec 11 17:07:54 GMT 2011


Martijn van Oosterhout <kleptog at gmail.com> writes:

> On 11 December 2011 10:51, Andrew M. Bishop <amb at gedanken.demon.co.uk> wrote:
>> My own personal need is to import great_britain.osm and then add
>> ireland.osm both using data from geofabrik.  Either file can be
>> imported on its own and the only duplicate data comes from the tiny
>> overlap of the two data sets.  With this patch I can run osm2pgsql
>> twice, the first of which is fast because there is no duplicate data
>> to worry about and the second of which is fast because although there
>> is duplicate data there are not as many objects in the second file.
>> With this command line option it is faster to import now than it was
>> with both patches applied.
>>
>> osm2pgsql --create             [...] great_britain.osm
>> osm2pgsql --append --allow-dups [...] ireland.osm
>
> Ok, that's weird because --append is supposed to allow duplicates
> already, that's why the MODIFY flag exists already. Otherwise it would
> be a bit pointless.

I invite you to try it like I have.  The SVN code does not use MODIFY
instead of CREATE for the places that matter in this case.  The 2010
patch from the mailing list that I referred to in my original e-mail
is the one that does this.

Just to be clear, using these two commands (without the --allow-dups
option) with the SVN version of the intarray branch gives this error:

Reading in file: osm-data/ireland.osm
Processing: Node(3020k) Way(0k) Relation(0)COPY_END for COPY planet_osm_nodes FROM STDIN;
 failed: ERROR:  duplicate key value violates unique constraint "planet_osm_nodes_pkey"
CONTEXT:  COPY planet_osm_nodes, line 22424: "21043717	735445482	-55818613	\N"


> Secondly, your patch changes some places where it does MODIFY now to
> using CREATE, so I imagine it's going to break some stuff in normal
> use.

Yes, I did say that in my original e-mail.  I reverted the 2008 patch
from the mailing list and included that change only when enabled by my
new command line option.

What was originally written about the 2008 patch by you (Martijn van
Oosterhout) was:

: Umm, yeah. There's that. The way I solved it was with the patch below,
: which is a gross hack but it works. Basically it turns every create
: into a modify so it deletes any conflicting rows before inserting. It
: may be the only way, but I'm still thinking on it...

If you look at the diff to the code you will see that this patch
applies to only one out of the three cases (the one nearest the bottom
of the file).

In 2008 it was a "gross hack" but perhaps is now considered part of
the normal operation - in that case you might want to leave that
change out of the new command line option.  If using MODIFY instead of
CREATE is slower then making it selectable on the command line would
be an improvement but, as you say, it would change the current
behaviour.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             amb at gedanken.demon.co.uk
                                   http://www.gedanken.org.uk/mapping/



More information about the dev mailing list