<div class="gmail_quote">On Sat, Nov 7, 2009 at 8:59 AM, Frank Steggink <span dir="ltr"><<a href="mailto:steggink@steggink.org">steggink@steggink.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


Sam, can you give some additional clarification what your intentions are? I'm afraid I'm not following them well. When you mentioning removing duplicate nodes and relations, it looks as if you intend to create a script which does some post-processing. Is that correct? I haven't started anything in that area. (I actually still need to start with the Python version of your batch script, but I'm going to work on that today.)<br>

</blockquote><div><br>Are these duplicate nodes/relations being created by the converter or are they in the source data?<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


Now we're talking on this: in shp-to-osm (Java) tags are now put properly on the multipolygon relationship. They also still appear on the inner polygons (mentioned to Ian already), but that should be fixed.<br></blockquote>

<div><br>Tags appears on the inner ways because you have an "inner" rule in the rules.txt file. If you remove those lines from the rules.txt, you should end up with no tags on inner ways of multipolygons. If this doesn't happen, then it's a bug and I will fix it.<br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Since shp-to-osm is called for one feature type at a time, there are some new challenges when multiple feature types are involved. I guess you've been thinking about that already. Duplicate nodes will become an issue when you have for example a residential area with an adjacent wooded area (assuming that the boundaries are matching exactly). It will be difficult to deal with this. I'm not sure if it would be technically possible to adjust shp-to-osm for that, but the result will be that the files will become huge. They already have to be split up for certain feature types, and I don't think it is possible to use the same set of IDs over multiple output files.<br>

</blockquote><div><br>I do have a "relationificator" plugin started for shp-to-osm that will attempt to solve this problem by converting exactly-overlapping edges into relations and delete duplicate primitives. If there's a strong need for it I can continue to work on it.<br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

>From what I understand about the upload process (and someone please correct me if this isn't right), the OSM server will return new ID numbers for any nodes, ways, and relationships uploaded. In the OSM files generated by Ian, and also when you're editing in JOSM yourself, temporary IDs are assigned. They have a negative value, which indicates that these objects don't exist on the server. So, this means that, after you have uploaded file00.osm, and you open file01.osm, JOSM or the server do no longer remember to what objects any IDs are _referring_ to, if those objects are not _defined_ in the same file. </blockquote>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

The same issue is going on with multipolygon relationships, where a part of the ways are reused. This can only happen if everything is defined in the same file. And such a file will be way too large to upload safely to the server. Recently I noticed that if you want to create/update/delete about 10k objects the server is going to "act difficult". </blockquote>

<div><br>I normally upload my NHD changesets with 40k-50k changes in each upload without problem. It takes an hour or so to apply (depending on server load), but it works without error.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Regarding relationships, and reuse of the geometry: I think that we have not only to remove duplicate nodes, but also split up ways, otherwise the JOSM validator will complain about overlapping ways. A way can be used in multiple relationships.<br>

</blockquote><div><br>This would also happen with the relationificator.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

A third thing which might need to be resolved are map features which cross the boundary of the NTS tiles. Do we want to merge them? If these features have the same Geobase metadata (ID, etc.), then it shouldn't be a big problem, otherwise we need to decide whether we prefer to keep the metadata, or if we want to have merged features.<br>


<br>

All of this means we can't do anything to clean up the data. Sure we can, but this can only be done after an initial upload to the server. That way we can still apply any logic to deal with duplicate nodes, reuse of features in multiple relationships, and merging features. The script will have to work live on the server: download an area, do the cleanup, and upload. In such case I think it would be the safest (and required!) that the script only does the download and the cleanup, and that a human verifies the result before upload. If we're implementing such cleanup, it needs to be executed as soon as possible after the upload, because sometimes users are very quick to make changes to freshly uploaded data.<br>

</blockquote><div><br>It's relatively dangerous to upload these "dirty" OSM files to the server and the apply changes later. If someone makes a change to a single node in your data, then you suddenly can't make that automated change.<br>

</div></div>