[Talk-us] 'address' tags in Massachusetts

Max Erickson maxerickson at gmail.com
Mon Mar 26 01:17:08 UTC 2018


>There is a talk-us-massachusetts@ and I think review of your proposed
>mechanical edit should include that list.

Okay. This is pretty preliminary still, I just decided that feedback
was a good idea. Does that list overlap enough with this one that you
forwarding the message would be sufficient?

>I suspect people would be amenable, but it would be good to publish the
>code, and the proposed files to upload, so that they can be reviewed.

I've put the code at https://github.com/maxerickson/massadd

I guess I should have mentioned in the earlier message that it does
some (conservative) formatting cleanup. Tidying uppercase and
expanding a small number of abbreviations.

There's no OSM file because I don't think it is ready for upload (it's
quick enough to generate if you have curl and python3 installed). The
data at the gist does fully reflect the changes the script would
currently make.

>Also, it certainly makes sense that there are some values that are hard
>to parse.  The obvious approach is to just leave them out (and leave
>them for manual fixing or another day), but I'm not really sure exactly
>what you are proposing to do.

At the moment if the 'address' field is not parsed without error the
script doesn't do anything with that object. There's some 'address'
fields that are parsed incorrectly (mostly they include a po box or a
unit and could be excluded by matching for those). That's part of the
not being ready.

>As for nodes with both the old addr tag and new ones, you imply that the
>simplest way is to clean them up before a mechanical edit.  But that
>implies that if they aren't fixed first, you might do something to those
>nodes, and that seems against the spirit of the mechanical edit policy,
>which involves refraining from changes that you can't basically prove
>are correct.  But I don't expect you'd be doing that, so perhaps you can
>expand on what you meant.

Yeah, it isn't implemented yet but that is what I would do, skip those objects.

Overall I'm not in any hurry, but the majority of the address parse
correctly and after an edit would be used by more data consumers and
show up more clearly in editors and so on. It's something like 100
manual fixes to clear the way for about 3900 automatic splits.


Max



More information about the Talk-us mailing list