[Imports] canvec-to-osm 0.9.6 now available
Frank Steggink
steggink at steggink.org
Sat Nov 7 14:59:18 GMT 2009
Sam Vekemans wrote:
> Im not making a new script, the folks who understand python, are
> making a script that can solve the 'duplicate intersecting nodes' and
> 'inner/outer relation' problem.
>
> (Yes you might have fixed it, but "i've reached the crux of my
> technical ability")
>
> Sure, they can use java, if that works for them.
>
> Im just isolating my canvec-to-osm script to handle the 80 map
> features that i know converts correctly. The other 10 can be dealt
> with using another option.
>
> Since i dont know how to program (accept in basic DOS) im not that much help.
>
> Others who are experienced in python & java are welcome to take the lead.
>
> Frank already started to help out maptastically :)
>
> Sam
>
> On 11/6/09, Ian Dees <ian.dees at gmail.com> wrote:
>
>> On Nov 6, 2009, at 9:28 PM, Sam Vekemans
>> <acrosscanadatrails at gmail.com> wrote:
>>
>>
>>> This python script is yet to be made 'canvec2osm.py' its open to
>>> anyone to make, and i recommend that who ever does, its ONLY 1 person
>>> who is in charge of maintaining the script.
>>>
>> Why are you creating another shp to osm converter? Is there something
>> the existing tools don't do? I thought you were using shp-to-osm? What
>> changed?
>>
Hi Sam,
The idea what I mentioned is to convert your batch files to Python. The
main reason for that is that this way people using Linux or other OSs
can also perform this conversion locally. I know you want to convert all
of Canada yourself, but that is still a lot of work. Many hands make
light work. The only thing we should be concerned about is that we're
all using the same rule file. This is the most vital part in ensuring
consistency across the country.
To Ian: shp-to-osm won't disappear in the process. Sam's batch files are
still calling it. It would be pointless to create a second version.
There is already a Python script with the same name, but it was written
specifically for the MassGIS import.
Sam, can you give some additional clarification what your intentions
are? I'm afraid I'm not following them well. When you mentioning
removing duplicate nodes and relations, it looks as if you intend to
create a script which does some post-processing. Is that correct? I
haven't started anything in that area. (I actually still need to start
with the Python version of your batch script, but I'm going to work on
that today.)
Now we're talking on this: in shp-to-osm (Java) tags are now put
properly on the multipolygon relationship. They also still appear on the
inner polygons (mentioned to Ian already), but that should be fixed.
Since shp-to-osm is called for one feature type at a time, there are
some new challenges when multiple feature types are involved. I guess
you've been thinking about that already. Duplicate nodes will become an
issue when you have for example a residential area with an adjacent
wooded area (assuming that the boundaries are matching exactly). It will
be difficult to deal with this. I'm not sure if it would be technically
possible to adjust shp-to-osm for that, but the result will be that the
files will become huge. They already have to be split up for certain
feature types, and I don't think it is possible to use the same set of
IDs over multiple output files.
From what I understand about the upload process (and someone please
correct me if this isn't right), the OSM server will return new ID
numbers for any nodes, ways, and relationships uploaded. In the OSM
files generated by Ian, and also when you're editing in JOSM yourself,
temporary IDs are assigned. They have a negative value, which indicates
that these objects don't exist on the server. So, this means that, after
you have uploaded file00.osm, and you open file01.osm, JOSM or the
server do no longer remember to what objects any IDs are _referring_ to,
if those objects are not _defined_ in the same file.
The same issue is going on with multipolygon relationships, where a part
of the ways are reused. This can only happen if everything is defined in
the same file. And such a file will be way too large to upload safely to
the server. Recently I noticed that if you want to create/update/delete
about 10k objects the server is going to "act difficult". Regarding
relationships, and reuse of the geometry: I think that we have not only
to remove duplicate nodes, but also split up ways, otherwise the JOSM
validator will complain about overlapping ways. A way can be used in
multiple relationships.
A third thing which might need to be resolved are map features which
cross the boundary of the NTS tiles. Do we want to merge them? If these
features have the same Geobase metadata (ID, etc.), then it shouldn't be
a big problem, otherwise we need to decide whether we prefer to keep the
metadata, or if we want to have merged features.
All of this means we can't do anything to clean up the data. Sure we
can, but this can only be done after an initial upload to the server.
That way we can still apply any logic to deal with duplicate nodes,
reuse of features in multiple relationships, and merging features. The
script will have to work live on the server: download an area, do the
cleanup, and upload. In such case I think it would be the safest (and
required!) that the script only does the download and the cleanup, and
that a human verifies the result before upload. If we're implementing
such cleanup, it needs to be executed as soon as possible after the
upload, because sometimes users are very quick to make changes to
freshly uploaded data.
Whew, another long one. I hope you don't mind :) Any thoughts about this
essay? Keep in mind this is just my opinion, and by no means the thing
we should actually do. Many of you know the Canvec data better than I
do, so you'll also know better if this approach makes sense.
Cheers,
Frank
More information about the Imports
mailing list