[Talk-us] TIGER road expansion code
Serge Wroclawski
emacsen at gmail.com
Sat May 12 04:20:11 BST 2012
Since the other thread has gotten a bit long, I want to start a new
thread to discuss the TIGER road expansion code.
The current version of the code is at: https://gist.github.com/2656735
Taking Dale's test cases, I've met a new version of the code and ran
it against Maryland. (I didn't put the code up yet, I can if someone
asks)
This time, instead of 21 ambiguous names (expanding 99.9997% of ways),
it came up with only 1 ambiguous road (>99.9999% of ways), and that
one is an interesting case where a user came in and modified the tiger
tags, changing the tiger:name_base to the name, while leaving the
tiger:name_type in place, so "Lyon Dr" was the name, "Dr" was the
name_type and "Lyon Dr" was the name_base. This seemed like an odd
case and the script did the right thing.
I looked over the other examples where the script would have punted
but now expanded, and it looks like it did the right thing, though
there may be some issues with the TIGER data. for example:
"W and W Industrial Rd" expands to "West and W Industrial Road", since
W is the direction_prefix, but the second W is unaccounted for, the
script doesn't know if that is supposed to be W or West (and neither
do I). The old script would have punted (since it's ambiguous which W
should be expanded) the new one expands the first, since "W" is the
direction_prefix.
I think instead of focusing on these odd edge cases, we focus on the
fact that we're now hitting the .0001% of roads that can't be expanded
and accept that we're going to have to accept some small error rate,
and so instead of focusing on fixing them, decide how we want to
identify them).
As for the code itself, I'm happy to take feedback, but I'd find it
much easier to work with if that feedback came in the form of specific
code questions, patches, or specific real world examples.
- Serge
More information about the Talk-us
mailing list