[Talk-ca] importing GeoBase Data (learning from TIGER)

Sam Vekemans acrosscanadatrails at gmail.com
Thu Nov 27 01:38:22 GMT 2008


Thanks, i have updated the wiki to include more points.

Idea: shut down osm for upload of geobase road data? = scrubbed, we (1
person) can load the data tile by tile (using the conversion script,
hopefully ibycus can help)
issue w/ id #duplicates? = the routeen would be set to retry and and
only load this skipped data. ... so there probably is script that
would work.   ... however if there is a way to load it all at once? ..
i'm sure the Foundation would go for it. ... but going at it, tile by
tile, makes more sense to keep the accuracy, and lower the risk of
error.

Idea: make it all render=no, so to avoid having 'ghost lines' in
'mostly complete' areas. = almost scrubed.
Solution: Going at it tile by tile, starting with 001001.(see geobase
map) .. and using the wiki page to show the chart of each tile as its
done, would help. .. and make to process consistant.
For most of Canada.. it would be fine to NOT to include the render=no
tag and so... with tile import, we ALL can go at it, and look for
duplicates that the script misses. ... once were happy that it looks
good, we go on to the next tile.  (we dont need to physically be
there) and can fix the 'ghost lines' ...

Issue: Duplicate edges
I think Ibycus has fixed that with his script, (converting the stuff
to polish format), remember that the way the geobase roads are set, is
that the road is broken up, block by block.  most Canada roads is a
grid pattern, and lines up fine.  I don't see an issue it.

Issue: Uploading polygons that cross bounding box tiles.
Ya, again Ibycus topo had that issue, and solved it someway.
We need to remember that these boxes are .5 x .25 degree's big. ...
manually fixing, as we need to include missing data such as the right
park name, and the relation ie. BC parks. would be a manual thing
(above the fix) as trails etc, all need to be added. BTW, the newer
version of the map contains combined tile sets, making for a smaller
number of files to load.

Solution: don't know yet.

Issue: OSM file size (ie. toronto duplicates)
Well, ya... the Ibycus topo is 3 gigs of IMG files, OSM files are 4
times larger, so thats 12 gigs of data. ... (including the countour
lines), so i don't know.. the roads would probably be a smaller load,
than the natural features and & poi the other databases to be loaded.
So we'll need to have a chart showing what databases are being loaded,
and whats the status of each.
So for those particular tiles which cover a large about of data,
having everyone at it (adjusting ghost linns) around the same time,
wouldn't be practicle. .. using a script (Hi road, you have the same
name and the same type, so i wont bother joining OSM) would work.
... the problem with that is of course, on future imports. .. BUT
because the script says (Hi road....) when future roads are in place
on OSM, the script wouldn't alow the import of that new Geobase data.
So if  we can show and prove it.
ie. on the test area remove geobase data, and add in OSM user same
road name, same class... then import geobase data again, to see if the
script picked it up, and asked that question.

Solution: Start with 001K11, which is not  St. Johns NFLD ... but
south of it. and try :) .. maybe start with land & water features
first?

Issue: Attempting to map data already done.
As richard pointed out (about the render=no) .. there has been an
example where the user was set out to map a parking are, when by the
time they got home, someone else was also mapping, a few hours before
or after and had the same idea. .. so all they could do is look a
little deeper, and see what other features are missing, that they
could possably add.
The solution to that persons delema is this:  By contacting all the
Mappers in the area, and physically meeting them, you can get an idea
of what kinds of things people would like to map. .. and so its common
to start with your local area, and build outwards. .. so if you are
out mapping, announce it, and see if others want to join in.

If the render=no was an option for Toronto, after importing the data,
and running the script .. with only the 'maybies' being rendered no.
... the 1st task for the user would be hold out a little, and wait or
helpout, until the import process is done for the area.
If everyone in the tile are knows that importing is going on, the
likely hood of mappers being dissapointed is minimal.

The priority is this:
-Make sure the importing script code, asks if road names & types are
in there, if so it (or that road segment) doesnt get imported.
-make sure that all data that can be imported, does get imported, and
that the data of duplicates, the script should know the difference.
-take our time, going through each tile slowely, to make sure all
duplicates, and errors got fixed for the next run. .. so reverting
back would be minimal.

Maybe that helps?
Cheers,
Sam

P.S. feel free to add to the Wiki :) & ya, this discussion is delayed
by a few days. ... but its worth it, to make it right :)

On 11/26/08, richard at weait.com <richard at weait.com> wrote:
>> On Wed, 26 Nov 2008, Sam Vekemans wrote:
>>
>>> Well, remember (last week i think it was) when OpenStreetMap was shut
>>> down
>>> for maintenance?
>>> Well, what about convincing the foundation to shut down the server so
>>> then
>>> all the data can be uploaded at once?
>>> That would fix the problem that you had.  :)
>>
>> If we want to do a progressive import ( small tile by small tile) then
>> this
>> won't work, we aren't talking about one server shutdown but many.  I'm
>> also
>> no so sure the rest of the OSM community is keen on outages for data
>> imports.    We might be better off writing scripts to detect (and maybe
>> fix/revert?) conflicts after the fact.
>
> I think asking OSM to shut down so we can play is unlikely to win us
> friends.  And I don't think that it is required.  There was much more data
> imported from TIGER than we have from GeoBase, and that was done county by
> county I believe.
>
> GeoBase tiles may be a rough equivalent in size to the county uploads from
> TIGER.  I've emailed one of the TIGER import folks and asked him to join
> us here on talk-ca.
>
> I also think that uploading everything and hiding some / all of it is a
> bad idea.  We know that tagging for the renderer is sub-optimal and that
> things should be tagged "correctly" so that future renderers and editors
> will "get it".
>
> Needless duplication of data (say OSM Toronto, plus Toronto on GeoBase) is
> wasteful of our resources in terms of database space and bandwidth to
> editors.
>
> I also see potential trouble with making additions and changes to any
> "overlaid" Toronto data.  Imagine that you spend an afternoon adding bike
> routes and bus routes as relations, but didn't notice that half of the
> ways you worked on were "render=no".  Or that you did notice and just
> changed them to render=yes because of course you want to see your
> relations render....
>
> I'm very excited that we have this wonderful data contribution and that we
> have such an enthusiastic and energetic group to participate in the
> discussion and import.
>
> I think we should take a measured approach and delicate steps.  TIGER took
> months to upload, and had at least one false start.  We don't have a
> deadline to include the GeoBase data.  Let's find a way to include it that
> makes it super easy to accept updates from GeoBase in future (hello, road
> names, I'm talking to you).  And let's avoid three or four uploads of
> everything, then rollbacks, then uploads again.  Nobody wants to see
> Canada rendered then unrendered like a web site that over-uses the < blink
>> tag.
>
> Best regards,
> Richard
>
>
>




More information about the Talk-ca mailing list