[Talk-ca] [Imports] canvec-to-osm 0.9.6 now available

Sat Nov 7 15:56:38 GMT 2009

Hi Ian,

Ian Dees wrote:
> On Sat, Nov 7, 2009 at 8:59 AM, Frank Steggink <steggink at steggink.org 
> <mailto:steggink at steggink.org>> wrote:
>
>     Sam, can you give some additional clarification what your
>     intentions are? I'm afraid I'm not following them well. When you
>     mentioning removing duplicate nodes and relations, it looks as if
>     you intend to create a script which does some post-processing. Is
>     that correct? I haven't started anything in that area. (I actually
>     still need to start with the Python version of your batch script,
>     but I'm going to work on that today.)
>
>
> Are these duplicate nodes/relations being created by the converter or 
> are they in the source data?
I'm talking mostly from my experience with Geobase. When roads are 
imported, where there is already data from an adjacent tiles, nodes get 
duplicated. This also happens when I copy over Geobase data multiple times.

I'm not sure how this works out for the Canvec data, because there are 
already multiple files. However, if features in multiple files are 
touching each other, their nodes will be identified as duplicates. This 
is also true for adjacent NTS tiles. The problem with shp-to-osm which I 
found earlier this week has already been fixed :)
>  Now we're talking on this: in shp-to-osm (Java) tags are now put 
> properly on the multipolygon relationship. They also still appear on 
> the inner polygons (mentioned to Ian already), but that should be fixed.
>
> Tags appears on the inner ways because you have an "inner" rule in the 
> rules.txt file. If you remove those lines from the rules.txt, you 
> should end up with no tags on inner ways of multipolygons. If this 
> doesn't happen, then it's a bug and I will fix it.
OK, that could explain why this is happening. I'll look at it, and share 
my observations.
>  
>
>     Since shp-to-osm is called for one feature type at a time, there
>     are some new challenges when multiple feature types are involved.
>     I guess you've been thinking about that already. Duplicate nodes
>     will become an issue when you have for example a residential area
>     with an adjacent wooded area (assuming that the boundaries are
>     matching exactly). It will be difficult to deal with this. I'm not
>     sure if it would be technically possible to adjust shp-to-osm for
>     that, but the result will be that the files will become huge. They
>     already have to be split up for certain feature types, and I don't
>     think it is possible to use the same set of IDs over multiple
>     output files.
>
>
> I do have a "relationificator" plugin started for shp-to-osm that will 
> attempt to solve this problem by converting exactly-overlapping edges 
> into relations and delete duplicate primitives. If there's a strong 
> need for it I can continue to work on it.
That sounds very interesting. I guess this only works within the single 
shape file which is being converted, correct? What is the behavior if a 
file has to be split up, because it becomes too large?
>  
>
>     >From what I understand about the upload process (and someone
>     please correct me if this isn't right), the OSM server will return
>     new ID numbers for any nodes, ways, and relationships uploaded. In
>     the OSM files generated by Ian, and also when you're editing in
>     JOSM yourself, temporary IDs are assigned. They have a negative
>     value, which indicates that these objects don't exist on the
>     server. So, this means that, after you have uploaded file00.osm,
>     and you open file01.osm, JOSM or the server do no longer remember
>     to what objects any IDs are _referring_ to, if those objects are
>     not _defined_ in the same file. 
>
>
>     The same issue is going on with multipolygon relationships, where
>     a part of the ways are reused. This can only happen if everything
>     is defined in the same file. And such a file will be way too large
>     to upload safely to the server. Recently I noticed that if you
>     want to create/update/delete about 10k objects the server is going
>     to "act difficult". 
>
>
> I normally upload my NHD changesets with 40k-50k changes in each 
> upload without problem. It takes an hour or so to apply (depending on 
> server load), but it works without error.
What program are you using for the upload? Is it bulk-upload, JOSM, or 
something else? I'm using JOSM, because I had problems with bulk-upload. 
If there is something better and more robust (the upload with JOSM fails 
when there are about 10k changes), I would certainly give it a try.
>  
>
>     Regarding relationships, and reuse of the geometry: I think that
>     we have not only to remove duplicate nodes, but also split up
>     ways, otherwise the JOSM validator will complain about overlapping
>     ways. A way can be used in multiple relationships.
>
>
> This would also happen with the relationificator.
>  
>
>     A third thing which might need to be resolved are map features
>     which cross the boundary of the NTS tiles. Do we want to merge
>     them? If these features have the same Geobase metadata (ID, etc.),
>     then it shouldn't be a big problem, otherwise we need to decide
>     whether we prefer to keep the metadata, or if we want to have
>     merged features.
>
>     All of this means we can't do anything to clean up the data. Sure
>     we can, but this can only be done after an initial upload to the
>     server. That way we can still apply any logic to deal with
>     duplicate nodes, reuse of features in multiple relationships, and
>     merging features. The script will have to work live on the server:
>     download an area, do the cleanup, and upload. In such case I think
>     it would be the safest (and required!) that the script only does
>     the download and the cleanup, and that a human verifies the result
>     before upload. If we're implementing such cleanup, it needs to be
>     executed as soon as possible after the upload, because sometimes
>     users are very quick to make changes to freshly uploaded data.
>
>
> It's relatively dangerous to upload these "dirty" OSM files to the 
> server and the apply changes later. If someone makes a change to a 
> single node in your data, then you suddenly can't make that automated 
> change.
Exactly, that is why this should be happen as soon as possible. However, 
how are we going to deal with large files? Ultimately this might mean 
that all Canvec data of a single tile has to be uploaded all at once.

Frank