Menu

#46 Add Telephone Number recognition to the RST parser

Default
closed-rejected
nobody
None
5
2015-09-24
2015-09-15
No

This proposal defines how phone numbers would be recognized and marked up in a document tree.

There are two recognition mode's proposed for phone number recognition: simple and explicit mode.

Simple Mode::

A phone number is defined as a sequence of digits separated by zero or more separation fields, specifically one of the set period (.) or dash (-). The regular expression for the match would be defined by r'\d+([-.]\d+)*'.

The Simple Mode matching would only be allowed in the :Contact: field parsing.

Explicit Mode::

A phone number must be prefixed by the telephone markup to be recognized as a number. The markup can take the form of the three letter mneumonic tel, which can optionally be capitalized and may optionally be followed by a period (.). This is followed by a volon and then the phone number as specified by the simple mode above. It's regular expression would be of the form r'[Tt]el\.?:\s*\d+([-.]\d+)*'.

The Explicit Mode matching would be across the entire document.

Markup::

Telephone elements would be marked up in the Doctree as follows::

<reference refuri="tel:12.34.56.78.90">
12.34.56.78.90</reference>

Discussion

  • Günter Milde

    Günter Milde - 2015-09-21

    I would not want an "explicit mode" different from what is already implemented:

    The rST support for "standalone hyperlinks" is specified in
    http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#standalone-hyperlinks

    Although not listed explicitely, "tel:" is among the "known schemes, per the Official IANA Registry of URI Schemes" and indeed, the example tel:+45/123-4567 is converted to the XML <reference refuri="tel:+45/123-4567">tel:+45/123-4567</reference>.

    If a different URI and display is desired, the various hyperlink references all allow the "tel:" scheme in the URI, e.g. `+45-321/12345 <tel:+45-321/12345>`__.

    The "simple mode" could be considered as a special case, where valid "tel:" URIs are recognized also without the scheme. (Similar to email addresses but restricted to a "contact" field.)

     

    Last edit: Günter Milde 2024-08-09
  • Jeffrey C. Jacobs

    I agree with following the URI rules for other URI objects rather than an explicit mode. The existing markup for telephones and other URIs should be sufficient for resolving ambiguity and then defining special rules for the Contact field would be all I propose added since the Contact field already has a special "implied email" handler. Any numberic sequence conforming to a standard phone number, even only of a series of digits should be a phone number as there is already a separate Address field so the Contact should not contain an address. But this is something we still may want to discuss.

     
  • Günter Milde

    Günter Milde - 2015-09-22

    Actually, the "implied email" handler is not restricted to the :contact: field. An email address is recognized anywhere in the document (try it). See also states.py.
    This means that a special handling of phone numbers in :contact: would be a novity without precedence. This makes acceptance in the docutils core a bit harder. You will have to prove that the advantages over using a "normal" hyperlink reference outwigh the added complexity and the possibility of false positives.

    For the screenplay writer, the alternatives would be

    • use of conforming tel: URIs in the source,
    • use hyperlink reference markup (`+32/1234-34 <tel:+32/1234-34>`__) or similar (inconvenient),
    • post-process the content of :contact: in a transform (idea: if the complete content conforms to a telephone number, convert to a hyperlink reference)
     
  • Jeffrey C. Jacobs

    So we could then just say there is no simple mode, all references must be explicit as tel:514-555-1212 and that is already parsed into a refuri node as I understand what you're telling me though if one wants the writer to drop the tel: prefix one has to use the embedded link syntax: `514-555-1212 <tel:514-555-1212>`__ which I assume binds the interpreted text into a self-referential hyperlink with node:</tel:514-555-1212>

    <reference refuri="tel:514-555-1212">
    ....514-555-1212</reference>

    (BTW, this phone number in the +1 exchange [US/Canada] is information for Montréal, PQ.)

     

    Last edit: Jeffrey C. Jacobs 2015-09-22
  • Günter Milde

    Günter Milde - 2015-09-23

    We could say:

    If you want the telephone number to become a hyperlink, you can use a valid "tel:" URI (like tel:+35/12345.6789-33). If the link text should be different from the "href" value, use a
    "hyperlinke reference", either regular, anonymous or embedded.
    Example::

     :contact: `1234`__
     :address: at home
     ...
    
     __ tel:+33/456-1234
    

    Would this suffice?

     

    Last edit: Günter Milde 2024-08-09
  • Jeffrey C. Jacobs

    I think that's a clever way to get the linked number while keeping the nitty-gritty of how the hyperlink works somewhat opaque. So I do like it.

    My main motivation though is just allowing more than one form of contact, like:

    :contact: (514) 555-1212__
    noone@nothing.com
    ...

    __ tel:+1-514-555-1212

    Which is a nice approach to solve the problem as stated but means if the only reason to have the markup is to just distingush it from an email address for post processing it's a bit of a kludge.

    Recall that my main motivation is to make it easier for a Writer to pull out the docinfo date in order to populate a number of standard fields used in the Manuscript format. Clearly just marking it up as an interpreted role could also work but that seems to general to me. It may just mean, if you want to provide a phone number in a Manuscript document (for instance in the ODT writer and I assume Latex and clearly the same could hold for HTML templates) you have to mark it up in a standard URI using the Tel protocol.

    I'm not per se happy about making it so complicated but I can't think of another solutuon which would avoid being unnecessarily essoteric.

     
  • Günter Milde

    Günter Milde - 2015-09-24

    How about using separate fields - even if these are not
    standard docutils-docinfo-fields::

    :author: A\. Hitchcock
    :phone: 123/456 789-0
    :email: ah@example.com
    

    Looks good with standard writers and makes it easy for human readers as well as post processing software to interpret the contact info.

    As current behaviour (recognition of tel: URI as standalone hyperlink) is close to the proposed "explicit mode" and somethin along the "simple mode" is better implemented as a transform acting on a :contact: or ☎️ docinfo field, I suggest closing this ticket.

     
  • Jeffrey C. Jacobs

    If we go wiew fields, which of course won't get promoted to DocInfo and must be searched for by a writer which could be a level of complexity that we might later find unwarrented if these new fields become standardized but we work with what we have, and with that being the result of this proposal of keeping the status quo and adding new fields to docinfo is itself a different issue I concur. I'm closing the ticket.

     
  • Jeffrey C. Jacobs

    • status: open --> closed-rejected
     

Log in to post a comment.

MongoDB Logo MongoDB