The examples are contrived because we're not testing.  We're pointing out why this is a bad idea.  Using real world examples would just encourage people to fix those examples and ignore the fact that the process is wrong.<div>

<br></div><div>Anyway, you realize that the road type doesn't always appear after the base name, right?</div><div><br>---------- Forwarded message ----------<br>From: <b>Serge Wroclawski</b> <br>Date: Friday, May 11, 2012<br>

Subject: [Talk-us] Fixing TIGER street name abbreviations<br>To: Dale Puch <<a href="mailto:dale.puch@gmail.com" target="_blank">dale.puch@gmail.com</a>><br>

Cc: <a href="mailto:talk-us@openstreetmap.org" target="_blank">talk-us@openstreetmap.org</a><br><br><br>On Fri, May 11, 2012 at 4:17 PM, Dale Puch <<a>dale.puch@gmail.com</a>> wrote:<br>


> I understand the script checks for only one instance of the abbreviation.<br>

<br>

> My point was what is someone manually expanded ONE of the abbreviations,<br>

> leaving "st something street"?  Is that checked for?<br>

<br>

I have a number of thoughts here:<br>

<br>

1.  Real world examples.<br>

<br>

Many of the examples I've seen are contrived. I'm glad we're testing,<br>

but testing needs to be based on actual data seen in the US dataset.<br>

<br>

That said:<br>

<br>

2. There are a couple of ways to handle this:<br>

<br>

* One way (the most conservative way) would be to test for untouched<br>

TIGER ways. That is ways in which they're still at version 1. This<br>

would be a real problem, though, since there are lots of examples were<br>

someone may have fixed the geometry without touching the tags.<br>

<br>

* The other way is a method I'm using in an experimental branch of the<br>

code on my machine, which is to try to be a bit more selective about<br>

the expansions of road types. If we assume that the road type always<br>

appears after the base name, we can be handle examples like (real<br>

world example) "St Marys St". The same would hold true for direction<br>

tags, so we'd be able to expand "E E St" confidently as well.<br>

<br>

But there's a catch. If someone would have edited the name of the<br>

above street from the original "St Marys St" to "St. Marys St" then<br>

that test would fail, and the expansion would never occur, where as in<br>

the current version, it would.<br>

<br>

So:<br>

<br>

3. Any method used is going to produce some number of potential either<br>

false positives or false negatives. I contend that the number of<br>

errors in either case will be so tiny that it will be lost in the<br>

noise, but there's no way to promise it will always be 0. The best we<br>

can do is toss out uncertain expansions and have them handled manually<br>

(which is something I'm working to make better in the next version of<br>

the code as well).<br>

<br>

But:<br>

<br>

4. I don't want us to rely on cleverness. I'd much rather rely on<br>

people testing the code with real world inputs and checking the<br>

outputs.<br>

<br>

<br>

I should have a new version of the code either tonight or tomorrow,<br>

with the new expansion rules.<br>

<br>

- Serge<br>

<br>

_______________________________________________<br>

Talk-us mailing list<br>

<a>Talk-us@openstreetmap.org</a><br>

<a href="http://lists.openstreetmap.org/listinfo/talk-us" target="_blank">http://lists.openstreetmap.org/listinfo/talk-us</a><br>

<br>

</div>