[Imports-us] Vermont, U.S. address import
Jared
osm at wuntu.org
Mon Oct 3 12:40:04 UTC 2022
Update for progress on planning the Vermont address import...
I had a good call and out-of-band emails with Alex who has been helpful
with tips and advice. As a result, I've refined my process for finding
existing OSM addresses (I am pulling from a Postgres database, as well as
using an Overpass query for any nodes or ways that are within a town
boundary.)
I've updated the wiki page
<https://wiki.openstreetmap.org/wiki/VCGI_E911_address_points_import> to
clarify parts of the plan, and limit the scope. My intent is to focus on
the lowest hanging fruit using manual verification (eg. towns with less
than 100 existing OSM addresses). If I develop tools to help with
automation, I'll report back with plans for processing towns that have more
than 100 existing OSM addresses.
I've created a Google sheet here
<https://docs.google.com/spreadsheets/d/1N_vGbQENK6owBKjX-u52dTViGks5PYqv0QyhIgbdBaI/edit?usp=sharing>
that contains the towns, and how many existing OSM addresses they currently
have. It also shows how many VCGI addresses exist for the town. You can
look at this to get a sense of which towns I plan on working on first.
I've started a git repo here:
https://github.com/JaredOSM/vermont-address-import
It currently contains a script that processes street names to conform to
OSM standards.
I've placed the VCGI address point data files for each town that I plan to
process. And as I create OSM files, I'll store them here for community
review.
I evaluated the VCGI data for duplicate addresses (eg. nodes that have the
exact same longitude and latitude). I only found two pairs of nodes in the
entire state. Apartment addresses from what I can tell or kept separate
and are not placed on top of each other.
I believe I've addressed all the questions that have been raised so far,
but if I've missed anything, or anyone else has remaining questions or
concerns, please let me know.
I will continue to wait for a while to make sure all concerns are
addressed. In the meantime, I'll plan on generating some additional draft
import files.
Thanks,
Jared
On Wed, Sep 21, 2022 at 7:43 PM Jared <osm at wuntu.org> wrote:
> Alex thanks for the encouragement, advice, and warnings.
>
> Responses to your comments below.
>
> My biggest regret is that I should have done the *municipal boundary
>> import BEFORE doing the address import*. Without those boundaries I had
>> no way to validate the addr:city tag, which ended up being unexpectedly bad
>> for a variety of reasons. This created a lot of extra work after which
>> would have been easier to deal with before. I just glanced at Vermont and
>> it looks like you don't have your municipal boundaries either so *this
>> warning applies to you*.
>
>
> I'd like to chat with you more about this as it sounds important.
>
> *"The goal is to import missing Vermont addresses."*
>> I suggest having accuracy as part of your goal. You don't need to
>> publicly announce it, but it will help you evaluation decisions. 99%
>> accuracy would be awesome, 95% accuracy would be a little sad.
>>
>
> With the Maine import, how did you assess this accuracy? Or, how would
> you suggest I go about determining accuracy?
>
> *"Larger towns ... skipped"*
>> Please keep a list of skipped towns on the wiki so others can follow in
>> your footsteps.
>>
>
> I'm thinking of having a table of town names on the wiki page with their
> progress. Let me know if you came up with a good system for Maine.
>
>
>> *"Esri has evaluated the data set"*
>> I reviewed many of the address data sets that Esri published to RapID and
>> found they didn't event attempt any validation, which was extremely
>> troubling. Please don't accept Esri's review as an endorsement of data
>> quality. For example, you have several "<tag k='addr:housenumber' v='0' />"
>> in your sample OSM file. I did a frequency analysis of Maine numbers and
>> discovered that the state used house number "999" as "we don't know this
>> house number". Consider doing similar with your data. They should be
>> positive, non-zero, numeric, non-empty, and there shouldn't be any
>> unusually high occurrences of any single number. There shouldn't be any
>> duplicates. Data quality will vary greatly town by town. You need to
>> re-validate each town independently because *they will have different
>> problems*.
>>
>
> I've improved my script to further validate house number. (must exist, be
> numeric, and greater than zero)
>
>
>> *"Tagging Plans"*
>> You've done the obvious address translations, there maybe be more useful
>> data in the data source which could translate to other OSM tags. If you
>> post an example record from the data source then reviewers may be able to
>> spot those. You didn't mention apartment numbers?
>>
>
> The source data does include site type (Single family house, mobile home,
> etc.) but on the local-vermont Slack channel we decided the site type
> should be associated with the building outline, not the address point.
>
> Relevant source data links are here. If anyone else sees other pieces
> that should be included, let me know.
> About the VCGI dataset:
> https://geodata.vermont.gov/datasets/VCGI::vt-data-e911-site-locations-address-points-1/about
> View the data in table form:
>
> https://geodata.vermont.gov/datasets/VCGI::vt-data-e911-site-locations-address-points-1/explore?showTable=true
> Metadata about the fields:
> https://maps.vcgi.vermont.gov/gisdata/metadata/EmergencyE911_ESITE.htm
>
> *"Data Transformation: Title Cases"*
>> You *can* do that but it will be wrong sometimes for things like
>> "McJagger's Lane".. I used a script which pulled the character casing from
>> nearby OSM roads with the same name spelling (ignoring whitespace,
>> punctuation and accents). It wasn't too much work and it produced very good
>> results.
>>
>
> I've updated the script to capitalize the letter after Mc. I'm sure there
> are exceptions, and other types of non-trivial capitalization and
> punctuation. I don't currently have the skills to programmatically compare
> address points to nearby streets. So I've added it to my list of manual
> post processing steps to check after a town file is generated.
>
> *"makes the following transformations, Ave -> Avenue"*
>> Please be careful with these translations. You don't want to translate
>> "Dr Albert Dr" into "Drive Albert Drive" (hint: doctor). Here's a full
>> list
>> <https://github.com/blackboxlogic/OsmTagsTranslator/blob/master/OsmTagsTranslator/Lookups/StreetSuffixes.json>
>> of road suffix translations. I may have more specific suggestions if I can
>> see a sample of your raw data source.
>>
>
> Alex pointed out to me that the source data breaks up the street name by
> its component parts. I've reworked my script to clean up and expand the
> parts and then concatenate them together. Thanks also for the full list of
> suffixes. I've incorporated it into my script.
>
>
>> *"Any address that already exist in OpenStreetMap will be removed"*
>> That sentence has a lot packed into it. Maybe describe your process? I
>> suggest that when you match elements in the data source to elements in OSM,
>> you take note on the distance between matched elements. If you choose not
>> to import it because there's a matching address but that match is miles
>> apart, then it would be a good candidate for human review.
>> When you look for matches in OSM, will you look at nodes, ways and
>> relations? Which fields will you consider for "matching"? Many OSM
>> addresses may not have a zipcode, state or town, will you consider those
>> matches?
>>
>
> I do not currently have an automated and highly accurate way of
> identifying existing OSM addresses. This is the primary reason for my plan
> to start with small towns with very few existing addresses. So far I've
> been using the points and polygons data from the OSM database (downloading the
> Vermont data from Geofabrik
> <https://osm-internal.download.geofabrik.de/north-america/us/vermont.html>
> and importing into Postgres). My hope is to make some progress with the
> easy towns... if I get some easy wins, I'm hoping I'll be willing to devote
> more time to handling the tougher cases.
> I'm treating this import as a hybrid "mechanical turk
> <https://en.wikipedia.org/wiki/Mechanical_Turk>" style first step in
> hopes of making *some* progress... *any* progress. Almost 40% of Vermont
> towns have less than 100 existing OSM address points. My hope is to clean
> the existing OSM towns addresses (complete addresses that are missing
> Street names, numbers, etc.) and then do a manual (1 by 1) removal of those
> items from my generated list.
>
>
>> *" Address point data (primarily street name) will be transformed, and
>> expanded to meet OSM standards."*
>> This is easy to mess up. Please show full details of this process with
>> examples. Maybe link to your code.
>>
>
> I'll work on expanding this explanation, and share the script. You've
> already helped me make this better and more robust.
>
>
>> You didn't mention handling multiple addresses that are in exactly the
>> same spot.
>>
>
> Greg brought this up as well. I need to investigate this further. I
> haven't noticed this in the data so far, but it probably exists. I've
> added it to my to-do list.
>
> I have a full set of tools to do each step (translate, validate, conflate,
>> commit). I'm happy to share my tools with you. It might be hard to pick up
>> and use tools made by someone else but at least you could see which
>> operations they perform and compare that against your own.
>> I strongly suggest considering
>> https://github.com/blackboxlogic/OsmTagsTranslator as the only part of
>> my process that I really polished and thought could be used by others. It
>> helps with translation and validation if you are comfortable with sql.
>>
> I'm also available to chat on the phone. If you want someone to talk
>> things through email me and we can connect.
>>
>
> I'll plan on reaching out to you soon with my list of questions. Thanks
> again for taking the time to look through the current state of the project.
>
> Jared
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/imports-us/attachments/20221003/bd884f33/attachment.htm>
More information about the Imports-us
mailing list