[OSM-dev] extracting house-number from string

Stefan Breunig stefan at mathphys.fsk.uni-heidelberg.de
Sat Mar 14 02:48:44 GMT 2009


True about \d. But this regex doesn't match any housenumber in "3rd
street". It matches
[number][maybe a single letter][word boundary].

Greetings
xeen

On Fri, Mar 13, 2009 at 19:11, Marcus Wolschon <Marcus at wolschon.biz> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Stefan Breunig schrieb:
>> /[0-9]+[a-z]?(?:$|\b)/i
>>
>> I guess a regular expression should be suitable. The above is from
>> my head and untested. [0-9]+     – matches any amount of numbers
>> [a-z]?     – optionally matches a letter (?:$|\b)   – either
>> matches the end of the string or a word boundary (space, dot, etc.)
>>  /i           – search case insensitve
>>
>> Should work for the examples you gave. You might want to limit the
>> [a-z] to [a-l] because I haven't even heard of an "123 f", so it
>> ignores some of the typos (i.e. 3r avenue). No idea how that works
>> for languages other than English. The regex will find "5" as
>> housenumber if someone e.g. writes "5. avenue".
>
> Hello Stefan,
> I thought about such a regexp but the issue is
> that in "5th street" it would match "5th" and there
> is neither a house-number "5th" nor a street named "street".
>
> If we can`t find anything better I`ll probably
> have to go this way and make special cases for
> X+("st","nd","rd","th").
> I though that maybe someone else already encountered
> this issue and we could discuss how he/she dealt with it.
> (I am still quite puzzled how things like the freeform-address-
> search in Google can actually work.)
>
> Sadly there are quite many address-formats. With
> french having the house-number before the street
> and the US with street-names that begin with numbers,...
> it gets quite complicated.
>
> BTW: You should not use a-z but \D to stay unicode-compatible,
> One developer of TS is russian and I would hate to break
> address-search for cyrillic names for him ;)
>
> Marcus
>
>>
>>
>> On Fri, Mar 13, 2009 at 16:28, Marcus Wolschon
>> <Marcus at wolschon.biz> wrote:
>>> Hello,
>>>
>>> does anyone know a good algorithm to extract the house-number
>>> from a string containing street-name and house-number? Cases
>>> coollected so far:
>>>
>>> // "xyz 12" // "xyz 12b" // "11b xyz" // "11 xyz" // "xyz" //
>>> "5th avenue" // "3rd avenue" // "2nd avenue" // "1st avenue"
>>>
>>> I need it for adding routing to house-numbers to the Traveling
>>> Salesman -navigator.
>>>
>>>
>>> Marcus
>>>
>>> _______________________________________________ dev mailing list
>>> dev at openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
>>>
>>>
>>
>>
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkm6oeAACgkQf1hPnk3Z0cQkjgCguPQBMNwCSeORrsVfasYnjDTi
> CV8Aniu340l+h1sr38g736zRdv9io78B
> =cKpX
> -----END PGP SIGNATURE-----
>
>



-- 
Please encrypt your mail:
http://mathphys.fsk.uni-heidelberg.de/~stefan/publickey.asc
FP: 2620 E737 FD50 60AB 86B6 1B9D 3BFD AFFB 5B15 6893




More information about the dev mailing list