[Talk-ca] [Imports] canvec-to-osm 0.9.6 now available

Sat Nov 7 15:14:58 GMT 2009

On Sat, Nov 7, 2009 at 8:59 AM, Frank Steggink <steggink at steggink.org>wrote:

> Sam, can you give some additional clarification what your intentions are?
> I'm afraid I'm not following them well. When you mentioning removing
> duplicate nodes and relations, it looks as if you intend to create a script
> which does some post-processing. Is that correct? I haven't started anything
> in that area. (I actually still need to start with the Python version of
> your batch script, but I'm going to work on that today.)
>

Are these duplicate nodes/relations being created by the converter or are
they in the source data?

> Now we're talking on this: in shp-to-osm (Java) tags are now put properly
> on the multipolygon relationship. They also still appear on the inner
> polygons (mentioned to Ian already), but that should be fixed.
>

Tags appears on the inner ways because you have an "inner" rule in the
rules.txt file. If you remove those lines from the rules.txt, you should end
up with no tags on inner ways of multipolygons. If this doesn't happen, then
it's a bug and I will fix it.

> Since shp-to-osm is called for one feature type at a time, there are some
> new challenges when multiple feature types are involved. I guess you've been
> thinking about that already. Duplicate nodes will become an issue when you
> have for example a residential area with an adjacent wooded area (assuming
> that the boundaries are matching exactly). It will be difficult to deal with
> this. I'm not sure if it would be technically possible to adjust shp-to-osm
> for that, but the result will be that the files will become huge. They
> already have to be split up for certain feature types, and I don't think it
> is possible to use the same set of IDs over multiple output files.
>

I do have a "relationificator" plugin started for shp-to-osm that will
attempt to solve this problem by converting exactly-overlapping edges into
relations and delete duplicate primitives. If there's a strong need for it I
can continue to work on it.

> From what I understand about the upload process (and someone please correct
> me if this isn't right), the OSM server will return new ID numbers for any
> nodes, ways, and relationships uploaded. In the OSM files generated by Ian,
> and also when you're editing in JOSM yourself, temporary IDs are assigned.
> They have a negative value, which indicates that these objects don't exist
> on the server. So, this means that, after you have uploaded file00.osm, and
> you open file01.osm, JOSM or the server do no longer remember to what
> objects any IDs are _referring_ to, if those objects are not _defined_ in
> the same file.

> The same issue is going on with multipolygon relationships, where a part of
> the ways are reused. This can only happen if everything is defined in the
> same file. And such a file will be way too large to upload safely to the
> server. Recently I noticed that if you want to create/update/delete about
> 10k objects the server is going to "act difficult".

I normally upload my NHD changesets with 40k-50k changes in each upload
without problem. It takes an hour or so to apply (depending on server load),
but it works without error.

> Regarding relationships, and reuse of the geometry: I think that we have
> not only to remove duplicate nodes, but also split up ways, otherwise the
> JOSM validator will complain about overlapping ways. A way can be used in
> multiple relationships.
>

This would also happen with the relationificator.

> A third thing which might need to be resolved are map features which cross
> the boundary of the NTS tiles. Do we want to merge them? If these features
> have the same Geobase metadata (ID, etc.), then it shouldn't be a big
> problem, otherwise we need to decide whether we prefer to keep the metadata,
> or if we want to have merged features.
>
> All of this means we can't do anything to clean up the data. Sure we can,
> but this can only be done after an initial upload to the server. That way we
> can still apply any logic to deal with duplicate nodes, reuse of features in
> multiple relationships, and merging features. The script will have to work
> live on the server: download an area, do the cleanup, and upload. In such
> case I think it would be the safest (and required!) that the script only
> does the download and the cleanup, and that a human verifies the result
> before upload. If we're implementing such cleanup, it needs to be executed
> as soon as possible after the upload, because sometimes users are very quick
> to make changes to freshly uploaded data.
>

It's relatively dangerous to upload these "dirty" OSM files to the server
and the apply changes later. If someone makes a change to a single node in
your data, then you suddenly can't make that automated change.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20091107/cd46a320/attachment.html>