[Talk-ca] Dealing with huge features

Sam Vekemans acrosscanadatrails at gmail.com
Tue Nov 10 05:59:05 GMT 2009


Hi Frank,

Yup, the wooded area file.... 'wooded area' actually covers most of canada.
:) ... unfortunatly it's not all protected_area :(

I omited that from my 'windows java version of the
'canvec-to-osm_0_9_7_3.zip" script.
http://www.mediafire.com/?sharekey=3b30da6df507290219747bd91027d4dd9de794d9333c8264

I also added a "#" in front of all of 'inner,' lines.. so it omits it all.

... my guess is that we need to handle these 'natural_features' in a
separate process.
Just like the Coraine import in France, where this particular feature can be
added in, and it wont interfer (if any) with the existing data.  This also,
is true for the 'waterbody' and 'rivers',

This is because, the only way to map the rivers is from imagery... or
walking to a river and tossing the GPS over the river a few times.  (or
putting the GPS in a little remote-control boat)  and floating it down the
river :-)  ... and running REAL FAST (or someone else could stop it)

... and for wooded areas... satellite imagery can see whats wood and whats
'been logged'... literally...

So my solution is that we separate these 2 processes (all 90 features) so
that 80 of them (those that i can handle) gets converted with the shp-to-osm
script (dos/java), then those other 9 features get processed using
bulk_import.py ...(or the canvec2osm superscript)   maybe using Emilie
Laffray's process?? ... or another process??
... its a kind of thing that could get sent to the server (down-stream) and
just bulk_implopped at an 'off_peak_hour' (provided... of course, that care
was given.... so it should really only be the 'expert_team' that does this
and coordinates efforts.

Yan Morin created a python version of the GeobaseNHN (back in the summer),
and loaded sucessfully the water features from Quebec (north east Ottawa
'waterbasin') area,

It took a 'really LONG time' because it was so slow. ... as it was divided
into 'chunks'

Here's the lowdown:
Basically, as long as the feature is 'technically accurate' when going to
upload.
The fact that its takes such a long time is not that big of a deal.

IF we use the power of the community, we can have 20 people working at
simply uploading these bit-sized files, where 1 person is in charge of a
tile area.  (and responsible to make it look good).

The fact that Canada is SO BIG... should NOT be a deturant. .. once these
files are available...
I'll be contacting local communitiess... and 'natually' folks all across the
country will be joining in, and asking how they can help.

(and people already are asking me)

I would like to be able to say to them.
"Hi, what tile area you in? (or what city, and i can tell them the tile
number)
then, i can give them a link to the folder on the NRCan ftp site as to where
they can find all the files that need to be loaded, i can then update the
google docs chart with their name (and give them a link to it) (so they can
update the chart as they go)
... then they can go Bananna's and upload as they like.   (they CANT mess it
up) because the system is in place where the technical barrier is there.

... and hey, once those files are uploaded to OSM (what we can do is 'hide'
those .zip files) by replacing that .zip file with an 'empty (no .osm files)
.zip file is possible todo. (if needed).... just an idea ..

... it IS a one-way import after-all

.. ps.  that .xml file version is a good idea.. as it automatically puts
that render=no tags in there, so it's a 'preventative step'?

Thats all for now,
I hope that makes sense (at least it did in my head)

Cheers,
Sam

On Mon, Nov 9, 2009 at 7:47 PM, Frank Steggink <steggink at steggink.org>wrote:

> Frank Steggink wrote:
> > Hi,
> >
> > While testing out the Python version of canvec-to-osm, I came across a
> > couple of huge features. Especially wooded areas have the tendency to
> > grow large.
> >
> > The second file I looked at (NTS tile 021L03), was already 3.1 MB large,
> > despite that I set maxnodes to 2000 in shp-to-osm. It was mostly
> > occupied by a giant multipolygon, which contains 321 members and more
> > than 30k nodes. When I opened this file in JOSM, it was really
> > struggling with it. Uploading this would be a real nightmare.
> >
> > Since the feature occupied less than half of the NTS tile, there would
> > even be room for several such features. In this scenario it is easy to
> > imagine that the nodes limit for getting data from the OSM server is
> > exceeded. I don't think this is a desirable situation, but I don't know
> > a clear solution how to deal with this.
> >
> > Although splitting up the features is not a good idea, it would at least
> > provide a means to upload the data in smaller chunks, and be able to
> > retrieve a part of the data, provided that the tile doesn't exceed the
> > server limit. Hopefully JOSM would also be more performant.
> >
> > For those curious, I have uploaded this OSM file here: [1]. It is part
> > of this area: [2].
> > Anyways, check the file out for yourself, and please share any ideas how
> > we should deal with a situation like this.
> >
> > In the meantime I noticed that NTS tile 021L10 contains even a 4.1 MB
> > large file. If there are roughly 10k nodes per MB, this would mean that
> > all multipolygon members would contain at least 40k nodes...
> >
> > Regarding the Python script: the first version is nearly complete. I
> > want to make a couple of small changes to it, and also check if
> > everything looks OK in JOSM, and perhaps generate a couple of tiles from
> > it (locally). Once that is done, I'll make it available to whoever is
> > interested. It takes roughly 35 mins to convert all features (except
> > highways and hydro) in NTS tile 021L. Executing shp-to-osm costs most of
> > the time. The script also downloads any missing Canvec SHP files, but
> > they were already downloaded.
> >
> > Cheers,
> >
> > Frank
> >
> > [1]
> >
> http://www.steggink.org/osm/Canvec_test/021l03_VE_1240009_2_Wooded_area10.osm.zip
> > [2]
> >
> http://www.openstreetmap.org/?minlon=-71.5&minlat=46&maxlon=-71&maxlat=46.25&box=yes
> >
> >
> > _______________________________________________
> > Talk-ca mailing list
> > Talk-ca at openstreetmap.org
> > http://lists.openstreetmap.org/listinfo/talk-ca
> >
> By the way, the rules file I used still defines tags for the inner
> polygons. Regarding the performance I don't think that matters much.
> The other file I checked has 43200 nodes in one file, so the huge
> multipolygon contains at least 41201 nodes. It has 745 multipolygon
> members.
>
> Frank
>
> _______________________________________________
> Talk-ca mailing list
> Talk-ca at openstreetmap.org
> http://lists.openstreetmap.org/listinfo/talk-ca
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20091109/d34538e1/attachment.html>


More information about the Talk-ca mailing list