[Talk-ca] Preferred phone number format

Matthew Darwin matthew at mdarwin.ca
Mon Feb 5 04:37:53 UTC 2018


Just an update on this activity.

  * I see people have been quite creative with the tags used for phone
    numbers, so it is taking me a bit longer to clean up than I
    originally thought.   Good to find all these weird tags: Phone,
    alt_phone, phone_1, phone_2, phone:tollfree, phone:toll-free, etc.
  * Radio stations and others with other explanations that are in the
    field now look like this: +#-###-###-#### (office);+#-###-###-####
    (on-air studio)
  * When a location had multiple phone numbers, and one was toll-free,
    I put it in phone:tollfree as that seemed to be used a bit (now
    ~100 times in Canada).  Alternately I could instead consolidate
    all the toll free phone numbers into the regular phone field. 
    Suggestions welcome
    (https://taginfo.openstreetmap.org/search?q=tollfree)
  * For phone numbers with the wrong number of digits: If I could
    figure out what was wrong (eg there was a web site listed) then I
    fixed it. In a dozen cases I couldn't make sense of the number and
    deleted it (also delete phone numbers that were like "+1-" (no
    real number).  Where no area code was listed, I left the number as
    7 digits only (someone local can probably fix it easily).   Phone
    numbers of "911" were also removed.
  * We are now down to ~140 unique formats.  Although this is a bit
    misleading if you compare it to my ~400 formats I mentioned
    initially, that doesn't include all the other tags I found and
    fixed along the way.  I also forgot to include relations in my
    initial query.... they're in there now.
  * Using josm for editing: regular expression search and the to-do
    list work quite well for this task.  Although eliminating
    non-printable characters from the value took a bit to figure out. 
    (there were also values with trailing spaces)

Here are the top 20 tags as of ~4pm ET Sunday:

   10669 phone"+#-###-###-####
    4392 phone"+# ###-###-####
    4206 phone"###-###-####
    2970 phone"+# ### ### ####
    2540 phone"+# ### ###-####
    2451 phone"(###) ###-####
    1076 phone"##########
     659 phone"+# ### #######
     547 fax"+#-###-###-####
     522 contact:phone"+#-###-###-####
     516 phone"+###########
     456 phone"#-###-###-####
     446 phone"### ### ####
     378 fax"+# ###-###-####
     283 contact:phone"### ###-####
     260 phone"+# (###) ###-####
     200 fax"+###########
     186 phone"### ###-####
     170 phone"(###)###-####
     162 fax"+# ### ###-####


On 2018-01-31 11:09 PM, OSM Volunteer stevea wrote:
>> 	• There are additionally ~45 phone numbers that use letters instead of digits (eg 1-555-GOT-BEER)
>> 	• ";" separator is used occasionally to indicate multiple phone numbers.  " ", "," and "/" are also used.
>> 	• There are random comments in the phone number field (not sure where these really should be?)
>> 	• Extensions are represented generally by "x" or "ext" or "ext."
>> 	• There are less than 1000 phone numbers using contact:phone instead of phone, using ~40 unique formats
>> 	• I did not analyze phone_1 or fax or any other tags.
>> I will continue to cleanup phone numbers across the country which are missing the leading +1 and or are not one of the 4 common formats listed above.  My thought is that
>> 	• I will leave the phone numbers of 1-555-GOT-BEER type.
>> 	• I will use ";" as multiple number separator.
>> 	• I will use "x" for extension.
>> 	• And I will be happy to cleanup the wonky ones with lots of text in them if there is a direction of where this should move to.  Example for a radio station: "office (###) ###-####; on-air studio (###) ###-####"
>>
>> Feedback welcome.
> Those sound largely sane and well thought out to me.  (And I wrote phone number parsers for the NANP about 30 years ago, um — wait for it — in HyperTalk!)  The GOT-BEER style are best left alone (imo) as smarter parsers eventually figure those out.  Yes, ; (semicolon) is a frequent separator in key:value pair value lists in OSM data.  Yes, x (choose a case, lower seems better and more common than upper) for extensions.  For the radio station/on-air studio stuff I'd make the first part of each of these "compound data" be the phone number in one of the acceptable formats along with other data, then have extra descriptive text added to the rest, even if in a semicolon-separated list.  That's a pretty regular set of alphanumerics and with maybe a eight or ten rules, (reasonable for a parser extracting machine-dialble phone numbers, if necessary), you're either done or at or above 99%, I'd be willing to wager (and I'm not a betting type, though I do play poker with friends and online).
>
> Nice job.
>
> SteveA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk-ca/attachments/20180204/fbc56085/attachment.html>


More information about the Talk-ca mailing list