[Talk-ca] Dealing with huge features

Sam Vekemans acrosscanadatrails at gmail.com
Tue Nov 10 06:33:32 GMT 2009


One thing i forgot to mention,

I'll use my trusty example on Vancouver Island

http://www.openstreetmap.org/?lat=48.599&lon=-124.63&zoom=10&layers=B000FTFT

Because the script 'chops up' that wooded area shp file into 'bite-sized
chunks 2000)  once all the 'chunks get uploaded'' it turns out correct.

Ie. the 'inners'  get loaded 1st, (and it looks funny) but once the 'outers'
are loaded.. then it seems to fix its self.

One way to handle that is this method (i might have used it for he area).

1 - load that big file to JOSM
2 - select the big polygon (outer ring) and 'delete'
3 - the result will look funny (but all those 'inners that are green, are
still tagged correctly)
4 - upload to OSM all those 'inners'  (select top half and upload those 1st,
then other half) and upload those (copying each selection to a new layer).
5 - when thats done... take  abreak :)
6 - in JOSM, hide all the other layers (accept the main osm.data)
7 - select that big outter ring and copy it to a new layer
8 - check the nodes size. (if it's STILL to big) cut it in half)
9 - upload that
10 - then when it renders in OSM.. it should be correct.

Another option is to use PostGISmess
- (my guess) is that you load the shp file into OpenJUMP  and then use the
'split area' option, where it cuts the whole thing into 5x5 (is nice an
round 0.1 degree squares).
-then save each square as a new file name (using A-Y)..
-then let shp-to-osm do its tricks on each area.

(Adam talked about that for a way to deal with downtown Vancouver with the
automatch roads)
... so it would be only 1 or 2 tiles out of that, that would need to me
manually worked with.



Cheers,
Sam

P.S. I know the tags on the example are  wrong (inner/outer/relation)... but
it still renders nicely :-) ... the file size was similiar.  ..
... and ya, i didn't upload much more of it.. .because of the reason you
mentioned.

On Mon, Nov 9, 2009 at 9:59 PM, Sam Vekemans
<acrosscanadatrails at gmail.com>wrote:

> Hi Frank,
>
> Yup, the wooded area file.... 'wooded area' actually covers most of canada.
> :) ... unfortunatly it's not all protected_area :(
>
> I omited that from my 'windows java version of the
> 'canvec-to-osm_0_9_7_3.zip" script.
>
> http://www.mediafire.com/?sharekey=3b30da6df507290219747bd91027d4dd9de794d9333c8264
>
> I also added a "#" in front of all of 'inner,' lines.. so it omits it all.
>
> ... my guess is that we need to handle these 'natural_features' in a
> separate process.
> Just like the Coraine import in France, where this particular feature can
> be added in, and it wont interfer (if any) with the existing data.  This
> also, is true for the 'waterbody' and 'rivers',
>
> This is because, the only way to map the rivers is from imagery... or
> walking to a river and tossing the GPS over the river a few times.  (or
> putting the GPS in a little remote-control boat)  and floating it down the
> river :-)  ... and running REAL FAST (or someone else could stop it)
>
> ... and for wooded areas... satellite imagery can see whats wood and whats
> 'been logged'... literally...
>
> So my solution is that we separate these 2 processes (all 90 features) so
> that 80 of them (those that i can handle) gets converted with the shp-to-osm
> script (dos/java), then those other 9 features get processed using
> bulk_import.py ...(or the canvec2osm superscript)   maybe using Emilie
> Laffray's process?? ... or another process??
> ... its a kind of thing that could get sent to the server (down-stream) and
> just bulk_implopped at an 'off_peak_hour' (provided... of course, that care
> was given.... so it should really only be the 'expert_team' that does this
> and coordinates efforts.
>
> Yan Morin created a python version of the GeobaseNHN (back in the summer),
> and loaded sucessfully the water features from Quebec (north east Ottawa
> 'waterbasin') area,
>
> It took a 'really LONG time' because it was so slow. ... as it was divided
> into 'chunks'
>
> Here's the lowdown:
> Basically, as long as the feature is 'technically accurate' when going to
> upload.
> The fact that its takes such a long time is not that big of a deal.
>
> IF we use the power of the community, we can have 20 people working at
> simply uploading these bit-sized files, where 1 person is in charge of a
> tile area.  (and responsible to make it look good).
>
> The fact that Canada is SO BIG... should NOT be a deturant. .. once these
> files are available...
> I'll be contacting local communitiess... and 'natually' folks all across
> the country will be joining in, and asking how they can help.
>
> (and people already are asking me)
>
> I would like to be able to say to them.
> "Hi, what tile area you in? (or what city, and i can tell them the tile
> number)
> then, i can give them a link to the folder on the NRCan ftp site as to
> where they can find all the files that need to be loaded, i can then update
> the google docs chart with their name (and give them a link to it) (so they
> can update the chart as they go)
> ... then they can go Bananna's and upload as they like.   (they CANT mess
> it up) because the system is in place where the technical barrier is there.
>
>
> ... and hey, once those files are uploaded to OSM (what we can do is 'hide'
> those .zip files) by replacing that .zip file with an 'empty (no .osm files)
> .zip file is possible todo. (if needed).... just an idea ..
>
> ... it IS a one-way import after-all
>
> .. ps.  that .xml file version is a good idea.. as it automatically puts
> that render=no tags in there, so it's a 'preventative step'?
>
> Thats all for now,
> I hope that makes sense (at least it did in my head)
>
> Cheers,
> Sam
>
>
> On Mon, Nov 9, 2009 at 7:47 PM, Frank Steggink <steggink at steggink.org>wrote:
>
>> Frank Steggink wrote:
>> > Hi,
>> >
>> > While testing out the Python version of canvec-to-osm, I came across a
>> > couple of huge features. Especially wooded areas have the tendency to
>> > grow large.
>> >
>> > The second file I looked at (NTS tile 021L03), was already 3.1 MB large,
>> > despite that I set maxnodes to 2000 in shp-to-osm. It was mostly
>> > occupied by a giant multipolygon, which contains 321 members and more
>> > than 30k nodes. When I opened this file in JOSM, it was really
>> > struggling with it. Uploading this would be a real nightmare.
>> >
>> > Since the feature occupied less than half of the NTS tile, there would
>> > even be room for several such features. In this scenario it is easy to
>> > imagine that the nodes limit for getting data from the OSM server is
>> > exceeded. I don't think this is a desirable situation, but I don't know
>> > a clear solution how to deal with this.
>> >
>> > Although splitting up the features is not a good idea, it would at least
>> > provide a means to upload the data in smaller chunks, and be able to
>> > retrieve a part of the data, provided that the tile doesn't exceed the
>> > server limit. Hopefully JOSM would also be more performant.
>> >
>> > For those curious, I have uploaded this OSM file here: [1]. It is part
>> > of this area: [2].
>> > Anyways, check the file out for yourself, and please share any ideas how
>> > we should deal with a situation like this.
>> >
>> > In the meantime I noticed that NTS tile 021L10 contains even a 4.1 MB
>> > large file. If there are roughly 10k nodes per MB, this would mean that
>> > all multipolygon members would contain at least 40k nodes...
>> >
>> > Regarding the Python script: the first version is nearly complete. I
>> > want to make a couple of small changes to it, and also check if
>> > everything looks OK in JOSM, and perhaps generate a couple of tiles from
>> > it (locally). Once that is done, I'll make it available to whoever is
>> > interested. It takes roughly 35 mins to convert all features (except
>> > highways and hydro) in NTS tile 021L. Executing shp-to-osm costs most of
>> > the time. The script also downloads any missing Canvec SHP files, but
>> > they were already downloaded.
>> >
>> > Cheers,
>> >
>> > Frank
>> >
>> > [1]
>> >
>> http://www.steggink.org/osm/Canvec_test/021l03_VE_1240009_2_Wooded_area10.osm.zip
>> > [2]
>> >
>> http://www.openstreetmap.org/?minlon=-71.5&minlat=46&maxlon=-71&maxlat=46.25&box=yes
>> >
>> >
>> > _______________________________________________
>> > Talk-ca mailing list
>> > Talk-ca at openstreetmap.org
>> > http://lists.openstreetmap.org/listinfo/talk-ca
>> >
>> By the way, the rules file I used still defines tags for the inner
>> polygons. Regarding the performance I don't think that matters much.
>> The other file I checked has 43200 nodes in one file, so the huge
>> multipolygon contains at least 41201 nodes. It has 745 multipolygon
>> members.
>>
>> Frank
>>
>> _______________________________________________
>> Talk-ca mailing list
>> Talk-ca at openstreetmap.org
>> http://lists.openstreetmap.org/listinfo/talk-ca
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20091109/0e25c1c0/attachment.html>


More information about the Talk-ca mailing list