<div dir="ltr">Update for progress on planning the Vermont address import...<div><br></div><div>I had a good call and out-of-band emails with Alex who has been helpful with tips and advice. As a result, I've refined my process for finding existing OSM addresses (I am pulling from a Postgres database, as well as using an Overpass query for any nodes or ways that are within a town boundary.)</div><div><br></div><div>I've updated <a href="https://wiki.openstreetmap.org/wiki/VCGI_E911_address_points_import">the wiki page</a> to clarify parts of the plan, and limit the scope. My intent is to focus on the lowest hanging fruit using manual verification (eg. towns with less than 100 existing OSM addresses). If I develop tools to help with automation, I'll report back with plans for processing towns that have more than 100 existing OSM addresses.</div><div><br></div><div>I've created a <a href="https://docs.google.com/spreadsheets/d/1N_vGbQENK6owBKjX-u52dTViGks5PYqv0QyhIgbdBaI/edit?usp=sharing">Google sheet here</a> that contains the towns, and how many existing OSM addresses they currently have. It also shows how many VCGI addresses exist for the town. You can look at this to get a sense of which towns I plan on working on first.</div><div><br></div><div>I've started a git repo here: <span style="font-family:Helvetica;font-size:12px"><a href="https://github.com/JaredOSM/vermont-address-import">https://github.com/JaredOSM/vermont-address-import</a></span></div><div><span style="font-family:Helvetica;font-size:12px">It currently contains a script that processes street names to conform to OSM standards.</span></div><div><span style="font-family:Helvetica;font-size:12px">I've placed the VCGI address point data files for each town that I plan to process. And as I create OSM files, I'll store them here for community review.</span></div><div><span style="font-family:Helvetica;font-size:12px"><br></span></div><div><span style="font-family:Helvetica;font-size:12px">I evaluated the VCGI data for duplicate addresses (eg. nodes that have the exact same longitude and latitude). I only found two pairs of nodes in the entire state. Apartment addresses from what I can tell or kept separate and are not placed on top of each other. </span></div><div><span style="font-family:Helvetica;font-size:12px"><br></span></div><div><font face="Helvetica"><span style="font-size:12px">I believe I've addressed all the questions that have been raised so far, but if I've missed anything, or anyone else has remaining questions or concerns, please let me know.</span></font></div><div><br></div><div>I will continue to wait for a while to make sure all concerns are addressed. In the meantime, I'll plan on generating some additional draft import files.</div><div><br></div><div>Thanks,</div><div>Jared</div>
</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 21, 2022 at 7:43 PM Jared <<a href="mailto:osm@wuntu.org">osm@wuntu.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Alex thanks for the encouragement, advice, and warnings.</div><div><br></div><div>Responses to your comments below.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">My biggest regret is that I should have done the <b>municipal boundary import BEFORE doing the address import</b>. Without those boundaries I had no way to validate the addr:city tag, which ended up being unexpectedly bad for a variety of reasons. This created a lot of extra work after which would have been easier to deal with before. I just glanced at Vermont and it looks like you don't have your municipal boundaries either so <i>this warning applies to you</i>.</blockquote><div><br></div><div>I'd like to chat with you more about this as it sounds important.</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><b>"The goal is to import missing Vermont addresses."</b></div><div>I
suggest having accuracy as part of your goal. You don't need to
publicly announce it, but it will help you evaluation decisions. 99%
accuracy would be awesome, 95% accuracy would be a little sad.<br></div></div></blockquote><div><br></div><div> With the Maine import, how did you assess this accuracy? Or, how would you suggest I go about determining accuracy?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><b>"Larger towns ... skipped"</b><br></div><div>Please keep a list of skipped towns on the wiki so others can follow in your footsteps.</div></div></blockquote><div><br></div><div>I'm thinking of having a table of town names on the wiki page with their progress. Let me know if you came up with a good system for Maine.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><b>"Esri has evaluated the data set"</b><br></div><div>I
reviewed many of the address data sets that Esri published to RapID and
found they didn't event attempt any validation, which was extremely
troubling. Please don't accept Esri's review as an endorsement of data
quality. For example, you have several "<tag k='addr:housenumber'
v='0' />" in your sample OSM file. I did a frequency analysis of
Maine numbers and discovered that the state used house number "999" as
"we don't know this house number". Consider doing similar with your
data. They should be positive, non-zero, numeric, non-empty, and there shouldn't be
any unusually high occurrences of any single number. There shouldn't be any duplicates. Data quality will
vary greatly town by town. You need to re-validate each town
independently because <b>they will have different problems</b>.<br></div></div></blockquote><div><br></div><div>I've improved my script to further validate house number. (must exist, be numeric, and greater than zero)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div><b>"Tagging Plans"</b><br></div><div>You've
done the obvious address translations, there maybe be more useful data
in the data source which could translate to other OSM tags. If you post
an example record from the data source then reviewers may be able to
spot those.
You didn't mention apartment numbers?</div></div></blockquote><div><br></div><div>The source data does include site type (Single family house, mobile home, etc.) but on the local-vermont Slack channel we decided the site type should be associated with the building outline, not the address point.</div><div><br></div><div>Relevant source data links are here. If anyone else sees other pieces that should be included, let me know.</div><div>About the VCGI dataset: <a href="https://geodata.vermont.gov/datasets/VCGI::vt-data-e911-site-locations-address-points-1/about" target="_blank">https://geodata.vermont.gov/datasets/VCGI::vt-data-e911-site-locations-address-points-1/about</a></div><div>View the data in table form:</div><div><a href="https://geodata.vermont.gov/datasets/VCGI::vt-data-e911-site-locations-address-points-1/explore?showTable=true" target="_blank">https://geodata.vermont.gov/datasets/VCGI::vt-data-e911-site-locations-address-points-1/explore?showTable=true</a><br></div><div>Metadata about the fields:</div><div><a href="https://maps.vcgi.vermont.gov/gisdata/metadata/EmergencyE911_ESITE.htm" target="_blank">https://maps.vcgi.vermont.gov/gisdata/metadata/EmergencyE911_ESITE.htm</a> </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>
</div><div><b>"Data Transformation: Title Cases"</b><br></div><div>You <i>can</i>
do that but it will be wrong sometimes for things like "McJagger's
Lane".. I used a script which pulled the character casing from nearby
OSM roads with the same name spelling (ignoring whitespace, punctuation
and accents). It wasn't too much work and it produced very good results.</div></div></blockquote><div><br></div><div>I've updated the script to capitalize the letter after Mc. I'm sure there are exceptions, and other types of non-trivial capitalization and punctuation. I don't currently have the skills to programmatically compare address points to nearby streets. So I've added it to my list of manual post processing steps to check after a town file is generated.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><b>"makes the following transformations, Ave -> Avenue"</b></div><div>Please
be careful with these translations. You don't want to translate "Dr
Albert Dr" into "Drive Albert Drive" (hint: doctor). Here's a <a href="https://github.com/blackboxlogic/OsmTagsTranslator/blob/master/OsmTagsTranslator/Lookups/StreetSuffixes.json" target="_blank">full list</a> of road suffix translations. I may have more specific suggestions if I can see a sample of your raw data source.<br></div></div></blockquote><div><br></div><div>Alex pointed out to me that the source data breaks up the street name by its component parts. I've reworked my script to clean up and expand the parts and then concatenate them together. Thanks also for the full list of suffixes. I've incorporated it into my script.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div><b>"Any address that already exist in OpenStreetMap will be removed"</b><br></div><div>That
sentence has a lot packed into it. Maybe describe your process? I
suggest that when you match elements in the data source to elements in
OSM, you take note on the distance between matched elements. If you
choose not to import it because there's a matching address but that
match is miles apart, then it would be a good candidate for human
review.</div><div>When you look for matches in OSM, will you look at
nodes, ways and relations? Which fields will you consider for
"matching"? Many OSM addresses may not have a zipcode, state or town,
will you consider those matches?<br></div></div></blockquote><div><br></div><div>I do not currently have an automated and highly accurate way of identifying existing OSM addresses. This is the primary reason for my plan to start with small towns with very few existing addresses. So far I've been using the points and polygons data from the OSM database (downloading <a href="https://osm-internal.download.geofabrik.de/north-america/us/vermont.html" target="_blank">the Vermont data from Geofabrik</a> and importing into Postgres). My hope is to make some progress with the easy towns... if I get some easy wins, I'm hoping I'll be willing to devote more time to handling the tougher cases. </div><div>I'm treating this import as a hybrid "<a href="https://en.wikipedia.org/wiki/Mechanical_Turk" target="_blank">mechanical turk</a>" style first step in hopes of making <i>some</i> progress... <i>any</i> progress. Almost 40% of Vermont towns have less than 100 existing OSM address points. My hope is to clean the existing OSM towns addresses (complete addresses that are missing Street names, numbers, etc.) and then do a manual (1 by 1) removal of those items from my generated list.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div><b>"
Address point data (primarily street name) will be transformed, and expanded to meet OSM standards."</b><br></div><div>This is easy to mess up. Please show full details of this process with examples. Maybe link to your code.</div></div></blockquote><div><br></div><div>I'll work on expanding this explanation, and share the script. You've already helped me make this better and more robust.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>You didn't mention handling multiple addresses that are in exactly the same spot.<br></div></div></blockquote><div><br></div><div>Greg brought this up as well. I need to investigate this further. I haven't noticed this in the data so far, but it probably exists. I've added it to my to-do list.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div></div><div>I
have a full set of tools to do each step (translate, validate,
conflate, commit). I'm happy to share my tools with you. It might be
hard to pick up and use tools made by someone else but at least you
could see which operations they perform and compare that against
your own.<br></div><div>I strongly suggest considering
<a href="https://github.com/blackboxlogic/OsmTagsTranslator" target="_blank">https://github.com/blackboxlogic/OsmTagsTranslator</a> as the only part of
my process that I really polished and thought could be used by others.
It helps with translation and validation if you are comfortable with
sql.</div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I'm also available to chat on the phone. If you want
someone to talk things through email me and we can connect.<br></div></div></blockquote><div><br></div><div>I'll plan on reaching out to you soon with my list of questions. Thanks again for taking the time to look through the current state of the project.</div><div><br></div><div>Jared</div></div></div>
</blockquote></div>