<br><br><div class="gmail_quote">On 13 May 2010 17:23, Robert Scott <span dir="ltr"><<a href="mailto:lists@humanleg.org.uk">lists@humanleg.org.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Hi all,<br>

<br>

I've been running some countrywide comparisons of the recently released OS Locator against the streets in OSM, using fuzzy string matching and the supplied bounding boxes to attempt to match each street in each dataset to one in the other. It's worked pretty well for most areas I tested. Of the ~826k named streets in OS Locator, about 424k of them have near perfect matches in OSM. A few tens of thousands more have what I would call spelling 'disagreements'. The rest of them have bad or no matches at all.<br>


<br>

I've put a description of the technique up here along with the preliminary results:<br>

<br>

<a href="http://humanleg.org.uk/code/oslmusicalchairs" target="_blank">http://humanleg.org.uk/code/oslmusicalchairs</a><br>

<br>

The thing I really need is suggestions for getting this data to users in a way that's practical to work with. It's a CSV currently.<br>

<br>

Thoughts welcome. So are bug reports of where my matching algorithm has gotten things wrong.<br></blockquote></div><br>What about using double metaphone for finding spelling disagreements?<br><br>Emilie Laffray<br>