[OSM-talk] Name finder: "Budapesterplatz" vs. "Budapester platz"

David Earl david at frankieandshadow.com
Fri Aug 17 02:23:07 BST 2007


On 17/08/2007 01:17, Thomas Krüger wrote:
> David Earl schrieb:
>> While I'm at it, what kind of abbreviation is common in these cases? Do 
>> people write Hauptbahnstr for Haupbahnstrasse or Hauptbahnstraße? Would 
>> you ever write Budapesterpl. or would it naturally always be Budapester 
>> Pl. abbreviated, if indeed abbreviation is used at all?
> 
> All versions should be possible. My recommendation is to generalize the
> names before indexing AND searching. Typical patterns defined by regular
> expressions should be mapped to the same strings:
> 
> - strip off all characters except letters, numbers and dots
> - make the whole string lowercase
> - Apply the regular expression replacements, examples:
>   * "street", "str\.?", "stra(ss)|(ß)e" -> "str"
>   * "rd\.?", "road" -> rd
>   * ...

That's essentially what I do. But I am dealing only with whole words at 
once - it is the equivalence between "X Y" and "XY" that is the problem, 
not the equivalence between "XY" and "XZ" (except I don't know all the 
examples of Y and Z here for languages other than English, which is the 
what I was trying to get at).

David







More information about the talk mailing list