[OSM-talk] Name finder: "Budapesterplatz" vs. "Budapester platz"
david at frankieandshadow.com
Fri Aug 17 02:23:07 BST 2007
On 17/08/2007 01:17, Thomas Krüger wrote:
> David Earl schrieb:
>> While I'm at it, what kind of abbreviation is common in these cases? Do
>> people write Hauptbahnstr for Haupbahnstrasse or Hauptbahnstraße? Would
>> you ever write Budapesterpl. or would it naturally always be Budapester
>> Pl. abbreviated, if indeed abbreviation is used at all?
> All versions should be possible. My recommendation is to generalize the
> names before indexing AND searching. Typical patterns defined by regular
> expressions should be mapped to the same strings:
> - strip off all characters except letters, numbers and dots
> - make the whole string lowercase
> - Apply the regular expression replacements, examples:
> * "street", "str\.?", "stra(ss)|(ß)e" -> "str"
> * "rd\.?", "road" -> rd
> * ...
That's essentially what I do. But I am dealing only with whole words at
once - it is the equivalence between "X Y" and "XY" that is the problem,
not the equivalence between "XY" and "XZ" (except I don't know all the
examples of Y and Z here for languages other than English, which is the
what I was trying to get at).
More information about the talk