#629 dollar sign escaping in canonical references

GREEN
closed-fixed
None
5(default)
2014-06-30
2013-12-23
Hugh A. Cayless
No

At the end of http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SACR it states: "If there is a need for an actual string including a dollar sign followed by a digit that is not supposed to be replaced, the dollar sign should be written as %24." This is problematic because '$' is a reserved character according to RFC 3986. It's a so-called "sub-delimiter" (though not a commonly used one). What this means is that '$' in a URI will not necessarily be treated the same as '%24', and so we shouldn't recommend that substitution.

It would be better either to escape the '$' the way most regex processors do, as '\$' or '$$', or to use '\1', '\2' etc. to reference capturing groups and recommend escaping that as '%5C'.

Discussion

  • James Cummings
    James Cummings
    2014-05-19

    Assigning to Hugh Cayless to bring to Council for quick explanation, vote, and implementation.

     
  • James Cummings
    James Cummings
    2014-05-19

    • assigned_to: Hugh A. Cayless
     
  • Had to re-read this to remind myself of what it's about. The upshot is that '$' is a special character in URI syntax, and is not normally escaped. '%24' in a URI would mean 'a dollar sign not being used as a sub-delimiter'. Therefore '%24' and '$' might not be interpreted identically by software that has to parse URIs. The simple solution is to pick a different escape syntax for '$' and recommend that any Canonical Reference interpreter substitute that at the same time as it handles '$n' back-references. I like '$$'.

     
  • Syd Bauman
    Syd Bauman
    2014-06-30

    Council agreed to '$$'

     
  • Syd Bauman
    Syd Bauman
    2014-06-30

    • Group: AMBER --> GREEN
     
  • Fixed with r12919.

     
    • status: open --> closed-fixed