[talk-au] Victorian Vicmap Address Import Proposal

Andrew Harvey andrew at alantgeo.com.au
Fri May 21 11:23:51 UTC 2021


> Please don't do that. Now that we have a complete set of admin_level 10 
> boundaries in Australia addr:suburb is now redundant.
> 
> Postcodes can be added once to the level 10 admin boundary or as a 
> separate postal_code boundary if they don't align.

Noted this is a point of contention, with some advocating to include others to exclude these attributes.

I can see arguments either way.

For now the iD preset doesn't show the any inherited attributes, so it will prompt mappers to supply these fields. While it might be redundant, it's also not wrong, would you go so far as removing these fields other mappers manually add?

If we tolerate when mappers manually enter these, then I'd say filling in these values is worth it, it shows the address as complete, and prevents other mappers manually adding this information we could have just imported anyway.

If you are relying on the level 10 boundary for addr:suburb, that means data consumers need to know that for Australia, addr:suburb comes from admin_level 10. In other regions it might be different and this information isn't really stored in OSM.

> 
> > `addr:state` is less important given these addresses fall within the 
> > Victoria state admin boundary already. The wiki touches on this saying 
> > "A few mappers consider higher-level tags, or even addr:city=* as 
> > redundant, since they could be calculated from the respective boundary 
> > relations they are contained in (if present and valid). However, such 
> > practice has severe disadvantages and can lead to wrong results."
> 
> If you read the explanation on the talk page this does not apply to 
> Australia because we have contiguous postal areas.

Yes and my research on the data seems to confirm we won't have any issues (except maybe a handfull) with relying on the boundary locality and postcode.

> It is actually harmful because you have to maintain the same piece of 
> information in thousands of places. Not only that having multiple 
> sources of information seems to confuse Nominatim. On the other hand 
> adding it has no real benefit, so I would suggest that you don't import 
> anything beyond the street name.

I agree that it does become extra maintenance when localities change. If we know there are no exceptions to the rule (addr:suburb is always based on the suburb/locality boundary), then it wouldn't be too much work to update the addresses when boundaries update, but as a mass change it's more prone to risk, needs community consultation etc. so becomes a bit of work.

On Thu, 20 May 2021, at 8:09 PM, Sebastian S. wrote:
> I have not read the code, will have a look but not sure how much I will understand. Therefore I'm asking how do you determine the location of the POI to be added?
> 
> In the NSW the address data was part of an area of the plot of land the address is for. So as part of the import process the area was converted to a node location.
> Long driveways or other thin parts of the plot resulted in the node often being outside of the actual area.
> 
> Also most plot of lands have the house towards one end and garden in the other. This results in the node outside of the building.

The Vicmap address data is point data, so for this import we don't fiddle with the location supplied.

I touch on this at https://gitlab.com/alantgeo/vicmap2osm#where-should-addresses-exist, local mappers can choose to leave the point as is, merge in into a building, or merge it into another amenity/shop/office object post-import.

> I assume that you do not intend to manually correct node locations such that they are on top of an unit.

In the current code, overlapping points are reduced when it's just a different unit value. Sometimes there are different address points at the same location, it's undecided yet if we should leave them overlapping, skip them in the import, or offset them slightly.

On Thu, 20 May 2021, at 8:39 PM, Andrew Davidson wrote:
> It would be interesting to know how many addresses there are mapped 
> already (and have far they are from the Vic dataset) as that would give 
> an idea as to how much work there would be in resolving which one to use.

Based on the latest snapshots and codebase I'm using,

Vicmap has 3,915,711 addresses
With existing process in place to convert units to flats and remove duplicates this comes down to 2,755,079 addresses.
OSM has 192,592 addresses (addr:housenumber or addr:interpolation)

Of the import candidates:

1,846,792 addresses exist within blocks with no existing OSM addresses and can safely be imported without conflicts
869,979 addresses are within blocks that have existing OSM addresses, but the addr:housenumber/addr:street don't exactly match (so likely can be imported automatically, but I need to do more QA on this)
139,410 addresses exactly match addr:housenumber/addr:street in OSM
50,539 vicmap addresses fall within an OSM address polygon (no comment on if the address is the same or not)

I'm still working on the code to produce the final import candidates, so these numbers (and groups) will change.

On Fri, 21 May 2021, at 11:11 AM, Daniel O'Connor wrote:
> Could you elaborate a bit on the sequence or chunking of the data, and how you'd go about importing/QAing, and rough timelines?

The plan is to split up the data based on the conflation outcome (some addresses are more certain we can import without issues) and then again by suburb.

By suburb ensures the changesets of are manageable size, can be reviewed in OSMCha etc.

Timelines TBA, depends when I get the code done and full wiki import write up, any outstanding community feedback issues resolved, and give enough noticed before the import goes ahead.

> The per suburb approach + review approach used by https://gitlab.com/dionmoult/osm-nsw-address-import for example might be a good level of granularity and process to follow.

I do plan do do by suburb, by further splitting into import confidence should reduce manual review. I probably won't do manual review of every address, so only the high confidence ones will go in, others probably go out to MapRoulette. To be determined.



More information about the Talk-au mailing list