[Talk-ca] Preferred phone number format

Matthew Darwin matthew at mdarwin.ca
Wed Feb 7 23:46:18 UTC 2018


A further update on this work:

  * I found more yet bizarre phone-related tags "phone:1", "telephone"
    and the like.  These have all been tidied.  My osmfilter now looks
    like this:    --keep="contact:*=* or phone*=* or Phone*=* or
    alt_phone=* or fax*=* or tty*=*"  Additional suggestions for
    something to search on are welcome so I get all phone numbers.
  * I found there were some formats used very regionally eg. Edmonton
    Schools used one format consistently and Ottawa Schools used a
    different format consistently.
  * The canada.poly filter I have been using includes Saint Pierre and
    Miquelon (which does not use North American dialing plan), as well
    as a few US entries (especially relations which go near the
    border). If anyone knows of a canada.poly that is tighter, can you
    point me in the direction?  I am generally leaving non-Canadian
    entries alone, but they do count in the stats below.
  * There are now 67 unique tag/phone number format combinations (down
    from 400+ originally) when using   egrep -i
    'k="[a-z:]*(phone|fax|tty)[a-z:]*" ' $OSMFILENAME | cut -d\" -f2,4
    | sed -e 's/[0-9]/#/g' | sed -e 's/[A-Z]/A/g' | sed -e 's/([a-zA-Z
    -]*)/(...)/g' | sort | uniq -c | sort -nr | wc -l
  * The bulk of the work remaining now is to reformat the big groups
    of numbers that do not begin with "+1".  I will make changes by
    area code to limit the number of canada-wide changesets.


As always, comments welcome.

Here is the new "top 20"as of ~10am ET today:

   12555 phone"+#-###-###-####
    4453 phone"+# ###-###-####
    4060 phone"###-###-####
    3749 phone"+# ### ### ####
    2624 phone"+# ### ###-####
    2239 phone"(###) ###-####
    1292 fax"+#-###-###-####
    1032 phone"##########
     941 contact:phone"+#-###-###-####
     323 phone"+###########
     322 phone"+# ### #######
     158 contact:fax"+#-###-###-####
     117 phone:tollfree"+#-###-###-####
     109 phone"###-####
      39 phone"+#-###-###-####;+#-###-###-####
      25 phone"+#-###-###-AAAA
      23 phone"+#-###-###-####x###
      17 phone"+# (###) ###-####
      14 phone"+#-###-###-####x####
       9 phone"+#-###-###-####x#



On 2018-02-04 11:49 PM, OSM Volunteer stevea wrote:
> On Feb 4, 2018, at 8:37 PM, Matthew Darwin <matthew at mdarwin.ca> wrote:
>> Just an update on this activity.
> Again, nice work!
>
>> Here are the top 20 tags as of ~4pm ET Sunday:
>>
>>    10669 phone"+#-###-###-####
>>     4392 phone"+# ###-###-####
>>     4206 phone"###-###-####
>>     2970 phone"+# ### ### ####
>>     2540 phone"+# ### ###-####
>>     2451 phone"(###) ###-####
>>     1076 phone"##########
>>      659 phone"+# ### #######
>>      547 fax"+#-###-###-####
>>      522 contact:phone"+#-###-###-####
>>      516 phone"+###########
>>      456 phone"#-###-###-####
>>      446 phone"### ### ####
>>      378 fax"+# ###-###-####
>>      283 contact:phone"### ###-####
>>      260 phone"+# (###) ###-####
>>      200 fax"+###########
>>      186 phone"### ###-####
>>      170 phone"(###)###-####
>>      162 fax"+# ### ###-####
> I'd appreciate others to chime in about this, but it seems where dashes and space characters overlap (are the only difference in format), those can be conflated together.  I'm not sure whether dash or space ends up as "the winner," but this should reduce the number of categories.
>
> As you consider additional conflations, you may be able to do this again and again, getting it down to a fairly small number of formats.  I urge additional feedback (here would be good) before additional conflations, but (I keep saying it):  nice work.
>
> SteveA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20180207/519db699/attachment.html>


More information about the Talk-ca mailing list