[Imports] Proposed import: Geelong building data
daniel.oconnor at gmail.com
Sun Jan 5 01:52:40 UTC 2014
Seem to have lost previous replies :S
On Sat, Dec 28, 2013 at 2:30 PM, Jason Remillard
<remillard.jason at gmail.com>wrote:
> Hi Daniel,
> FYI, the wiki page has a link to the full data set. It is ~ 125K
> buildings. Probably 125 change sets.
Yeah, I was going to go a lot smaller than that in all likelyhood, and use
JOSM's X-objects-per-changeset features.
> Do you have address data too?
> There is open address data (
the whole state, but no one has yet done the explicit permission and other
steps to make that suitable.
> - The tagging seem fine. You might want to put the date the source data in
> the changeset comments.
> Will do
> This is a lot of data. You need to break it up, something like 1000
> buildings per OSM/OSC file. If know python and linux I can send you a
> Send away. I'd do it with a bit of xpath or even manually, but because
there are relations, those tend to get broken the few times I've tried.
> Doing the building overlap check with the mapnik image layer is a bad
> idea. You can automate with postgis, do it with QGIS, or in JOSM by
> downloading the OSM data into the source OSM files, and running the JOSM
> So I really intend here to keep a human in the loop as much as possible.
While the JOSM validations will catch crossed buildings, the data would
only be uploaded to areas where it is immediately and obviously clear there
is no existing content.
There are a few duplicated way/node validation errors that need to be
checked fully, but most are due to multipolygons being turned into
relations, and being shared between multiple buildings.
> If the source data is very good and you do the overlap check ahead of time
> (postgis, or QGIS), you can have a set of buildings that don't overlap that
> can be uploaded with less checking and a set that do overlap that will need
> to be conflated individually. The building that are known to overlap with
> OSM you might end up keeping all of the OSM data, but add in the height and
> name tags. This only makes sense if the source data is really good.
> Otherwise, you may need to pick through everything by hand.
It might seem like a tremendous amount of effort to do this, but I'd err on
the side of human in the loop rather than automation
I'd figured I would just omit completely anything existing on the first
pass. For simple buildings, it's perferrable to keep the OSM version; but
for complex buildings that have been traced from bing; this dataset is more
It's easy to seperate the two classes of buildings (imported vs existing)
by querying content without a height tag.
You will want the JOSM validator to be clean on your OSM/OSC files. For
> - Negative building heights.
Manually checked and removed these. 29 / 125,000.
Additionally, anything that was 0.0 height was removed; as they majority
were shade sales, roofs, or other things suspended above ground but with no
Buildings < 0.5 metres might be removed as well.
- The source data has duplicates with itself.
As mentioned above, most appear to be part of relations. These wouldn't be
imported without specific cleanup. JOSM tells me there are 20/125,000 in
- The multipologon's have building=yes on ways and relation. IT should be
> either on the outer way or the relation, but not on the inner ways.
I'll likely try to omit these until last and remodel by hand.
- some round features are a bit overnoded.
Got specific examples/how you were able to query those? I'll use simplify
on them as appropriate.
> - duplicate nodes.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Imports