[Talk-ca] Talk-ca: Bulk Import of Address Range in GTHA from Metrolinx, Second attemps

John Marshall rps333 at gmail.com
Wed Feb 10 14:44:31 UTC 2016


This is all great news. I have been working to get the Federal
Government in Ottawa to use OSM for years, and this will help to get
even more organization on board.


John Marshall

On Fri, Feb 5, 2016 at 11:09 AM, Mojgan Jadidi <mojgan.jadidi at gmail.com> wrote:
> Dear all,
> Following to our import plan discussion please take a moment to read more
> details and clarification about the process.
>
> Background:
>
> When Triplinx was first launched, we received a lot of feedback regarding
> missing addresses. We explored a number of potential solutions (including
> building our own local copy of the OSM database merged with the StatCan
> address data), but decided that the best solution for both Triplinx and the
> OSM community was to add the data directly to OSM. This meant that we had to
> spend quite a bit of time ensuring that the data was correct, as we knew
> there was a greater element of accountability when publishing to OSM (as
> opposed to merging with a local copy of the database).
>
>
>
> Our Initial Import:
>
> I think there is still a bit of a misunderstanding about what we actually
> did. While we did use JOSM to import all of our data at once, we also spent
> several weeks prior to the upload manually reviewing the data in JOSM to
> resolve issues such as duplication and node conflicts. Originally we had
> intended to split the import into much smaller chunks, but JOSM was capable
> of handling our entire dataset and the modifications to it, so we decided to
> do a single JOSM upload. While this certainly doesn’t excuse our lack of
> communication about the process, I do want to emphasize that there was quite
> a bit of manual review before the upload.
>
>
>
> Process of Identifying and Creating Missing Address Ranges:
>
> Our process focused solely on address ranges, and the goal was to identify
> the gaps in the existing address ranges and populate these gaps with address
> ranges generated from StatCan Road Network data. We aimed to replicate the
> structure of the existing CanVec address range data.
>
>
>
> At a high level, our process of identifying gaps in the address ranges is
> summarized below:
>
>
>
> ·         For each side of each StatCan road segment with a valid address
> range (start value and end value exist and are different):
>
> o   Create buffer to the appropriate side (left or right) of the street
> segment
>
> o   Find all OSM address ranges that fall within the buffer and compute the
> intersection of these ranges and the buffer (extract only the portion within
> the buffer)
>
> o   For each OSM address range within the buffer:
>
> -  Localize the start and end of each address range to the street segment
>
> - Compute the distance along the segment from the localized start coordinate
> to the localized end coordinate
>
> o   If the sum of the distances divided by the length of the segment is less
> than the threshold (we used 0.2):
>
> -  This address range is poorly represented by existing OSM data, and is a
> good candidate to be added
>
> -  The segment is shifted to the appropriate side and trimmed (to replicate
> the structure of the CanVec data) and is added to an XML file
>
> o   Else:
>
> -  This address is likely represented by existing OSM address data
>
>
>
> Duplication:
>
> No process is perfect, so I certainly wouldn’t expect that there will be no
> duplication in our data. Our process is particularly susceptible to
> duplication when the addresses are represented by existing address nodes
> (rather than interpolated ways). That being said I believe we have manually
> reviewed our data extensively to remove this sort of duplication, and intend
> to continue to do so if/when the import is complete.
>
>
>
> Benefit To The Community:
>
> The vast majority of the address range data seems to come from some version
> of the CanVec data. While this data is reasonably comprehensive in some
> areas, it also has quite a few gaps. By adding the StatCan data, we can fill
> these some of these gaps in a consistent manner with a single data source.
> From our perspective, adding address data for areas/streets that don’t have
> this data is a step in the right direction.
>
>
>
> Please find on attachment a portion of our generated data.
>
>
> For details about import plan feel free to see:
>
> https://wiki.openstreetmap.org/wiki/Triplinx_Metrolinx_Import_Plan
>
> https://wiki.openstreetmap.org/wiki/User_talk:Triplinx.canada
>
> http://wiki.openstreetmap.org/wiki/Import/Catalogue
>
> http://wiki.openstreetmap.org/wiki/Contributors#Triplinx_Metrolinx
>
>
> For those who wants to check the initial changeset that are reverted by DWG:
>
> https://www.openstreetmap.org/changeset/36946223
> https://www.openstreetmap.org/changeset/36944498
> https://www.openstreetmap.org/changeset/36943764
> https://www.openstreetmap.org/changeset/36942733
> https://www.openstreetmap.org/changeset/36940905
> https://www.openstreetmap.org/changeset/36939163
>
>
> We are look forward to having your feedback,
>
>
>
> Sincerely yours,
>
>
> Mojgan
>
>
> Mojgan (Amaneh) Jadidi, Ph.D.
> Intern, Applied Research & Corporate Monitoring
> Planning & Policy | Metrolinx | 97 Front Street West, Toronto, ON, M5J 1E6 |
> T: 416-202-5844
>
> Postdoctoral Research Fellow
> GeoICT Lab | York University | 4700 Keele St, Toronto, ON, M3J 1P3
>
> _______________________________________________
> Talk-ca mailing list
> Talk-ca at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-ca
>



More information about the Talk-ca mailing list