#432 Change regex flavor on @matchPattern

GREEN
closed-fixed
Martin Holmes
None
1(low)
2013-11-21
2013-02-05
John P. McCaskey
No

The spec says the regex in the match pattern must be a regular expression according to W3C XML Schema Language. But that dialect of regex was tuned for a significantly different use. It has features not in mainline regex processors such as XSLT 2.0 and Javascript. And it takes some things as implicit that must be explicit in other regex dialects. For example, XML Schema's regex presupposes what in other regex dialects would be a ^ at the beginning and a $ at the end.

A helpful comparison chart is here: http://www.regular-expressions.info/refflavors.html

The lack of opening and closing anchors would probably be the most trouble-making difference.

I suggest the current language be changed to:

@matchPattern should use only common-denominator features widely available in regular expression processors.

Unfortunately, there is no spec for a common-denominator subset. If it's felt the TEI spec must cite some standard and preferably one in the XML family, cite XPath 2.0. Most of the unique features it has (such as its Unicode support) are unlikely to be used on datapointers.

Discussion

<< < 1 2 3 > >> (Page 2 of 3)
  • Same as bugs/601

     
  •  
  • Martin Holmes
    Martin Holmes
    2013-11-13

    Council 2013-11-13: MH will write to TEI-L to check whether anyone has actually depended on the limitations of the XML Schema version of regex; if not, implement this. Noted that processing in the Stylesheets is already being done with XSLT2, so is assuming XPath regex patterns.

     
  • Martin Holmes
    Martin Holmes
    2013-11-13

    • assigned_to: Lou Burnard --> Martin Holmes
     
  • Martin Holmes
    Martin Holmes
    2013-11-13

    Message sent to TEI-L 2013-11-13. If no objections by 2013-11-20, the change should be made.

     
  • We're not in the business of linking this to any particular technology. We're just saying "this is a regexp, according to the conventions of XPath; evaluate it however you find convenient". If you use pure XSLT 1.0, it is likely you won't be able to produce an implementation of this TEI markup, but that is true whichever regexp notation we chose.

     
  • Serge Heiden
    Serge Heiden
    2013-11-13

    OK, let's keep designing ethereal conventions.

     
  • "ethereal conventions"? that's deeply unfair, Serge! this TEI feature simply uses regular expressions, which have been more or less unchanged for 30 years or more, implemented in almost every language under the sun. so XSLT 1.0 does not support them - it is the outlier, not the TEI.

    All we have done here is say which formulation of regex rules to follow, but the vast vast majority of regex is the same the world over. We are NOT, at all, saying that you must use XSLT 2/ XPath 2 to implement this. Its just as easy to do in Javascript, as John M notes in one of these tickets.

     
  • Martin Holmes
    Martin Holmes
    2013-11-13

    Serge, I get that you're annoyed about something, but I'm not really clear on exactly what is upsetting you. Are you saying that we must specify e.g. XPath 2.0? Since there's no way (to my knowledge) to validate a regular expression with a schema or a Schematron rule, I see no reason why we shouldn't leave this to the user; right now it could only mean XPath 2.0, but a future version of XPath might introduce more features, and I see no reason not to allow people to use them without our having to change our specification. We say that @style contains (by default) CSS code, but we don't specify a CSS version, or a specific collection of CSS modules. Anyone who wants to be more precise than P5 cares to be can be so in their ODD, surely?

     
<< < 1 2 3 > >> (Page 2 of 3)