[Talk-us] Duplicates in data uploads (using JOSM) -- was: Re: [Imports] Uploads to City of Salisbury, MD

Jaakko Helleranta.com jaakko at helleranta.com
Thu Mar 22 14:47:20 GMT 2012


"With previous large uploads I have experience the same behaviour resulting
in massive dupes. So I guess it is not a conversion issue."

I don't have experience with conversions nor (mass) imports -- but I _have_
had "massive dupes" problems a number of times when uploading larger
amounts of data with JOSM over a bad connection. The problem has always
been related to the combination of large uploads and bad connections where
(if I understand right) the JOSM data upload connection gets a hick-up at
some point and isn't able to finish the job -- and doesn't leave a note for
itself where it was left of. Then, because of reasons I don't _exactly_
understand there's duplication of data on the next upload(s (attempts)).

My vague understanding is that this is due to at least the fact that JOSM
uploads nodes first and only after that the information about ways (i.e.
which nodes belong to which ways). And then when it hasn't gotten or
confirmation for succesful uploads (or it hasn't recorded that to it's data
file(?)) it considers the uploaded nodes to still be new at next upload(s
(attempts)).

I feel that duplication sometimes happens also to partial uploads where the
ways have uploaded, too, resulting in duplicate uploaded ways but I haven't
documented this well enough to say this solidly.

If you have a bad connection / feel that this may be your problem it is a
good idea to tweak the JOSM Advanced upload settings (Upload > Advanced
tab: "Upload data in chunks of objects. Chunk size: ____", where ____ is
your number of objects per chunk. I use 200 in with my Haitian connection.

Cheers,
-Jaakko
http://osm.org/user/jaakkoh
--
jaakko at helleranta.com * Skype: jhelleranta * Mobile: +509-37-269154  *
http://go.hel.cc/MyProfile



On Thu, Mar 22, 2012 at 8:28 AM, Marc Zoss <marczoss at gmail.com> wrote:

> Nick and Josh
>
> thanks for the clarification on your upload strategy. With previous large
> uploads I have experience the same behaviour resulting in massive dupes. So
> I guess it is not a conversion issue.
>
> If you want me to commit the remove duplicates changeset, I can do so. But
> you will have to go through the data subsequently and check if the issues
> are resolved and no new ones emerged.
>
> M
>
> On 22.03.2012, at 14:12, Nick Chamberlain wrote:
>
> > Josh and Marc,
> >
> > Thank you!  I apologize that I'm unable to speak the OSM language as
> > well as everyone, I'm working on it :)  I posted on the Salisbury,
> > Maryland Import page that Josh created to give more detail about my
> > uploads.
> >
> > I didn't really think that I created so many duplicates, because I did a
> > lot of things in JOSM before I actually chose to upload.  One thing I
> > know for sure is that I didn't I upload until I was actually able to - I
> > was getting a proxy error and the uploads were timing out when I
> > attempted to upload the entire batch.  I assumed that these attempts
> > were unsuccessful, which I might be wrong about and might have resulted
> > in duplication.
> >
> > I assumed that my successful attempts started, maybe @ 10901673, when I
> > realized I needed to break the original shapefile up tabularly into
> > percentiles and upload 10 segments of the building footprint dataset,
> > one after the other.  These were all definitely successful, and were
> > only done once per percentile.
> >
> > Josh, where are you finding the list of changesets in the format you
> > posted?  I can only figure out how to list them in my editor profile
> > with my comments.
> >
> > If you believe that the method you mention that removes the 71,000 nodes
> > is the best approach, please feel free to do so.  I will also gladly
> > manually fix the inner ring tagging issue as the data gets fixed.
> > Please let me know what I can do to help.  I am also willing to share
> > the .osm files and/or shapefiles if that will help.  Thanks.
> >
> > - Nick
> >
> > -----Original Message-----
> > From: joshthephysicist at gmail.com [mailto:joshthephysicist at gmail.com] On
> > Behalf Of Josh Doe
> > Sent: Thursday, March 22, 2012 8:51 AM
> > To: Marc Zoss
> > Cc: imports at openstreetmap.org; talk-us at openstreetmap.org; Nick
> > Chamberlain
> > Subject: Re: [Imports] [Talk-us] Uploads to City of Salisbury, MD
> >
> > On Thu, Mar 22, 2012 at 8:04 AM, Marc Zoss <marczoss at gmail.com> wrote:
> >> I briefly downloaded all sby:bldgtype-tagged ways and relation of
> > Maryland through the overpass-api. Then removed the ones having only a
> > sby:bldgtype tag, run the validator and deleted the duplicated nodes and
> > ways.
> >> This would result in a changeset to remove the roughly 71'000
> > duplicates nodes and ways.
> >>
> >> If the area was edited since the import and reverting gets tricky,
> > this might be the option to go, at least the result looks ok at the
> > first glance.
> >>
> >> Please also note that the conversion step seems to add a building=yes
> > tag on on inner ring of building polygons () which is certainly bad
> > tagging, despite the correct rendering (52 occurrences, so could be
> > fixed manually).
> >
> > Thanks for doing that, as that was the next step I was going to try. I
> > posted some regarding the changesets here:
> > http://wiki.openstreetmap.org/wiki/User_talk:Nick_SPW#Salisbury.2C_Maryl
> > and_import
> >
> > I think perhaps we should revert a subset of the changesets, such as the
> > dangling nodes, and then use your method to handle the rest.
> >
> > -Josh
>
>
> _______________________________________________
> Imports mailing list
> Imports at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/imports
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20120322/70a74b46/attachment-0001.html>


More information about the Talk-us mailing list