[Talk-us] Duplicates in data uploads (using JOSM) -- was: Re: [Imports] Uploads to City of Salisbury, MD

Nick Chamberlain nchamberlain at ci.salisbury.md.us
Thu Mar 22 14:57:15 GMT 2012



Thank you for the explanation.  I will tweak my chunk sizes further next
time.  I did so before, but they were still fairly large and took a few
hours per upload.  Reducing them might take longer, but if that fixes
duplication I will do that.  Thanks.


- Nick


From: Jaakko Helleranta.com [mailto:jaakko at helleranta.com] 
Sent: Thursday, March 22, 2012 10:47 AM
To: Marc Zoss
Cc: Nick Chamberlain; Josh Doe; imports at openstreetmap.org;
talk-us at openstreetmap.org
Subject: Duplicates in data uploads (using JOSM) -- was: Re: [Imports]
[Talk-us] Uploads to City of Salisbury, MD


"With previous large uploads I have experience the same behaviour
resulting in massive dupes. So I guess it is not a conversion issue."


I don't have experience with conversions nor (mass) imports -- but I
_have_ had "massive dupes" problems a number of times when uploading
larger amounts of data with JOSM over a bad connection. The problem has
always been related to the combination of large uploads and bad
connections where (if I understand right) the JOSM data upload
connection gets a hick-up at some point and isn't able to finish the job
-- and doesn't leave a note for itself where it was left of. Then,
because of reasons I don't _exactly_ understand there's duplication of
data on the next upload(s (attempts)). 


My vague understanding is that this is due to at least the fact that
JOSM uploads nodes first and only after that the information about ways
(i.e. which nodes belong to which ways). And then when it hasn't gotten
or confirmation for succesful uploads (or it hasn't recorded that to
it's data file(?)) it considers the uploaded nodes to still be new at
next upload(s (attempts)).


I feel that duplication sometimes happens also to partial uploads where
the ways have uploaded, too, resulting in duplicate uploaded ways but I
haven't documented this well enough to say this solidly.


If you have a bad connection / feel that this may be your problem it is
a good idea to tweak the JOSM Advanced upload settings (Upload >
Advanced tab: "Upload data in chunks of objects. Chunk size: ____",
where ____ is your number of objects per chunk. I use 200 in with my
Haitian connection.






jaakko at helleranta.com * Skype: jhelleranta * Mobile: +509-37-269154  *

On Thu, Mar 22, 2012 at 8:28 AM, Marc Zoss <marczoss at gmail.com> wrote:

Nick and Josh

thanks for the clarification on your upload strategy. With previous
large uploads I have experience the same behaviour resulting in massive
dupes. So I guess it is not a conversion issue.

If you want me to commit the remove duplicates changeset, I can do so.
But you will have to go through the data subsequently and check if the
issues are resolved and no new ones emerged.


On 22.03.2012, at 14:12, Nick Chamberlain wrote:

> Josh and Marc,
> Thank you!  I apologize that I'm unable to speak the OSM language as
> well as everyone, I'm working on it :)  I posted on the Salisbury,
> Maryland Import page that Josh created to give more detail about my
> uploads.
> I didn't really think that I created so many duplicates, because I did
> lot of things in JOSM before I actually chose to upload.  One thing I
> know for sure is that I didn't I upload until I was actually able to -
> was getting a proxy error and the uploads were timing out when I
> attempted to upload the entire batch.  I assumed that these attempts
> were unsuccessful, which I might be wrong about and might have
> in duplication.
> I assumed that my successful attempts started, maybe @ 10901673, when
> realized I needed to break the original shapefile up tabularly into
> percentiles and upload 10 segments of the building footprint dataset,
> one after the other.  These were all definitely successful, and were
> only done once per percentile.
> Josh, where are you finding the list of changesets in the format you
> posted?  I can only figure out how to list them in my editor profile
> with my comments.
> If you believe that the method you mention that removes the 71,000
> is the best approach, please feel free to do so.  I will also gladly
> manually fix the inner ring tagging issue as the data gets fixed.
> Please let me know what I can do to help.  I am also willing to share
> the .osm files and/or shapefiles if that will help.  Thanks.
> - Nick
> -----Original Message-----
> From: joshthephysicist at gmail.com [mailto:joshthephysicist at gmail.com]
> Behalf Of Josh Doe
> Sent: Thursday, March 22, 2012 8:51 AM
> To: Marc Zoss
> Cc: imports at openstreetmap.org; talk-us at openstreetmap.org; Nick
> Chamberlain
> Subject: Re: [Imports] [Talk-us] Uploads to City of Salisbury, MD
> On Thu, Mar 22, 2012 at 8:04 AM, Marc Zoss <marczoss at gmail.com> wrote:
>> I briefly downloaded all sby:bldgtype-tagged ways and relation of
> Maryland through the overpass-api. Then removed the ones having only a
> sby:bldgtype tag, run the validator and deleted the duplicated nodes
> ways.
>> This would result in a changeset to remove the roughly 71'000
> duplicates nodes and ways.
>> If the area was edited since the import and reverting gets tricky,
> this might be the option to go, at least the result looks ok at the
> first glance.
>> Please also note that the conversion step seems to add a building=yes
> tag on on inner ring of building polygons () which is certainly bad
> tagging, despite the correct rendering (52 occurrences, so could be
> fixed manually).
> Thanks for doing that, as that was the next step I was going to try. I
> posted some regarding the changesets here:
> and_import
> I think perhaps we should revert a subset of the changesets, such as
> dangling nodes, and then use your method to handle the rest.

> -Josh

Imports mailing list
Imports at openstreetmap.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20120322/b84beab0/attachment.html>

More information about the Talk-us mailing list