[Talk-ca] Talk-ca: Bulk Import of Address Range in GTHA from Metrolinx, Second attemps

Mojgan Jadidi mojgan.jadidi at gmail.com
Fri Feb 5 16:09:46 UTC 2016

Dear all,
Following to our import plan discussion please take a moment to read more
details and clarification about the process.


When Triplinx was first launched, we received a lot of feedback regarding
missing addresses. We explored a number of potential solutions (including
building our own local copy of the OSM database merged with the StatCan
address data), but decided that the best solution for both Triplinx and the
OSM community was to add the data directly to OSM. This meant that we had
to spend quite a bit of time ensuring that the data was correct, as we knew
there was a greater element of accountability when publishing to OSM (as
opposed to merging with a local copy of the database).

*Our Initial Import:*

I think there is still a bit of a misunderstanding about what we actually
did. While we did use JOSM to import all of our data at once, we also spent
several weeks prior to the upload manually reviewing the data in JOSM to
resolve issues such as duplication and node conflicts. Originally we had
intended to split the import into much smaller chunks, but JOSM was capable
of handling our entire dataset and the modifications to it, so we decided
to do a single JOSM upload. While this certainly doesn’t excuse our lack of
communication about the process, I do want to emphasize that there was
quite a bit of manual review before the upload.

*Process of Identifying and Creating Missing Address Ranges:*

Our process focused solely on address ranges, and the goal was to identify
the gaps in the existing address ranges and populate these gaps with
address ranges generated from StatCan Road Network data. We aimed to
replicate the structure of the existing CanVec address range data.

At a high level, our process of identifying gaps in the address ranges is
summarized below:

·         For each side of each StatCan road segment with a valid address
range (start value and end value exist and are different):

o   Create buffer to the appropriate side (left or right) of the street

o   Find all OSM address ranges that fall within the buffer and compute the
intersection of these ranges and the buffer (extract only the portion
within the buffer)

o   For each OSM address range within the buffer:

-  Localize the start and end of each address range to the street segment

- Compute the distance along the segment from the localized start
coordinate to the localized end coordinate

o   If the sum of the distances divided by the length of the segment is
less than the threshold (we used 0.2):

-  This address range is poorly represented by existing OSM data, and is a
good candidate to be added

-  The segment is shifted to the appropriate side and trimmed (to replicate
the structure of the CanVec data) and is added to an XML file

o   Else:

-  This address is likely represented by existing OSM address data


No process is perfect, so I certainly wouldn’t expect that there will be no
duplication in our data. Our process is particularly susceptible to
duplication when the addresses are represented by existing address nodes
(rather than interpolated ways). That being said I believe we have manually
reviewed our data extensively to remove this sort of duplication, and
intend to continue to do so if/when the import is complete.

*Benefit To The Community:*

The vast majority of the address range data seems to come from some version
of the CanVec data. While this data is reasonably comprehensive in some
areas, it also has quite a few gaps. By adding the StatCan data, we can
fill these some of these gaps in a consistent manner with a single data
source. From our perspective, adding address data for areas/streets that
don’t have this data is a step in the right direction.

Please find on attachment a portion of our generated data.

For details about import plan feel free to see:





For those who wants to check the initial changeset that are reverted by DWG:


We are look forward to having your feedback,

Sincerely yours,


*Mojgan (Amaneh) Jadidi, Ph.D.*
*Intern, Applied Research & Corporate Monitoring*
*Planning & Policy | **Metrolinx | 97 Front Street West, Toronto, ON, M5J
1E6 | T: 416-202-5844 <416-202-5844>*

*Postdoctoral Research Fellow*
*GeoICT Lab | York University | 4700 Keele St, Toronto, ON, **M3J 1P3*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20160205/cf413054/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OSM-XML-090.osm
Type: application/octet-stream
Size: 173235 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20160205/cf413054/attachment-0001.obj>

More information about the Talk-ca mailing list