[Imports] canvec-to-osm 0.9.6 now available

Frank Steggink steggink at steggink.org
Sat Nov 7 14:59:18 GMT 2009


Sam Vekemans wrote:
> Im not making a new script, the folks who understand python, are
> making a script that can solve the 'duplicate intersecting nodes' and
> 'inner/outer relation' problem.
>
> (Yes you might have fixed it, but "i've reached the crux of my
> technical ability")
>
> Sure, they can use java, if that works for them.
>
> Im just isolating my canvec-to-osm script to handle the 80 map
> features that i know converts correctly. The other 10 can be dealt
> with using another option.
>
> Since i dont know how to program (accept in basic DOS) im not that much help.
>
> Others who are experienced in python & java are welcome to take the lead.
>
> Frank already started to help out maptastically :)
>
> Sam
>
> On 11/6/09, Ian Dees <ian.dees at gmail.com> wrote:
>   
>> On Nov 6, 2009, at 9:28 PM, Sam Vekemans
>> <acrosscanadatrails at gmail.com> wrote:
>>
>>     
>>> This python script is yet to be made 'canvec2osm.py' its open to
>>> anyone to make, and i recommend that who ever does, its ONLY 1 person
>>> who is in charge of maintaining the script.
>>>       
>> Why are you creating another shp to osm converter? Is there something
>> the existing tools don't do? I thought you were using shp-to-osm? What
>> changed?
>>     

Hi Sam,

The idea what I mentioned is to convert your batch files to Python. The 
main reason for that is that this way people using Linux or other OSs 
can also perform this conversion locally. I know you want to convert all 
of Canada yourself, but that is still a lot of work. Many hands make 
light work. The only thing we should be concerned about is that we're 
all using the same rule file. This is the most vital part in ensuring 
consistency across the country.

To Ian: shp-to-osm won't disappear in the process. Sam's batch files are 
still calling it. It would be pointless to create a second version. 
There is already a Python script with the same name, but it was written 
specifically for the MassGIS import.

Sam, can you give some additional clarification what your intentions 
are? I'm afraid I'm not following them well. When you mentioning 
removing duplicate nodes and relations, it looks as if you intend to 
create a script which does some post-processing. Is that correct? I 
haven't started anything in that area. (I actually still need to start 
with the Python version of your batch script, but I'm going to work on 
that today.)

Now we're talking on this: in shp-to-osm (Java) tags are now put 
properly on the multipolygon relationship. They also still appear on the 
inner polygons (mentioned to Ian already), but that should be fixed.

Since shp-to-osm is called for one feature type at a time, there are 
some new challenges when multiple feature types are involved. I guess 
you've been thinking about that already. Duplicate nodes will become an 
issue when you have for example a residential area with an adjacent 
wooded area (assuming that the boundaries are matching exactly). It will 
be difficult to deal with this. I'm not sure if it would be technically 
possible to adjust shp-to-osm for that, but the result will be that the 
files will become huge. They already have to be split up for certain 
feature types, and I don't think it is possible to use the same set of 
IDs over multiple output files.

 From what I understand about the upload process (and someone please 
correct me if this isn't right), the OSM server will return new ID 
numbers for any nodes, ways, and relationships uploaded. In the OSM 
files generated by Ian, and also when you're editing in JOSM yourself, 
temporary IDs are assigned. They have a negative value, which indicates 
that these objects don't exist on the server. So, this means that, after 
you have uploaded file00.osm, and you open file01.osm, JOSM or the 
server do no longer remember to what objects any IDs are _referring_ to, 
if those objects are not _defined_ in the same file.

The same issue is going on with multipolygon relationships, where a part 
of the ways are reused. This can only happen if everything is defined in 
the same file. And such a file will be way too large to upload safely to 
the server. Recently I noticed that if you want to create/update/delete 
about 10k objects the server is going to "act difficult". Regarding 
relationships, and reuse of the geometry: I think that we have not only 
to remove duplicate nodes, but also split up ways, otherwise the JOSM 
validator will complain about overlapping ways. A way can be used in 
multiple relationships.

A third thing which might need to be resolved are map features which 
cross the boundary of the NTS tiles. Do we want to merge them? If these 
features have the same Geobase metadata (ID, etc.), then it shouldn't be 
a big problem, otherwise we need to decide whether we prefer to keep the 
metadata, or if we want to have merged features.

All of this means we can't do anything to clean up the data. Sure we 
can, but this can only be done after an initial upload to the server. 
That way we can still apply any logic to deal with duplicate nodes, 
reuse of features in multiple relationships, and merging features. The 
script will have to work live on the server: download an area, do the 
cleanup, and upload. In such case I think it would be the safest (and 
required!) that the script only does the download and the cleanup, and 
that a human verifies the result before upload. If we're implementing 
such cleanup, it needs to be executed as soon as possible after the 
upload, because sometimes users are very quick to make changes to 
freshly uploaded data.

Whew, another long one. I hope you don't mind :) Any thoughts about this 
essay? Keep in mind this is just my opinion, and by no means the thing 
we should actually do. Many of you know the Canvec data better than I 
do, so you'll also know better if this approach makes sense.

Cheers,

Frank





More information about the Imports mailing list