Crippled regex

  • I felt this was too ridiculous having had all my regexes working well from prior versions of Saxon (7.8), Java supporting all those so well, and now suddenly having to redo them just because XPath version of regex is so crippled (e.g., no word boundary \b match, which is not easy to replace).

    So, I went in and created an option to allow full regexes. This simply shortcuts the RegexTranslator function. The command line option is:


    and the FeatureKey is

    The diffs are on the Patches page here.

    Thanks for your consideration.

    • Thanks Mike, I knew you wouldn't integrate this. If you think it would make a difference if I sent you documentation and test cases, I will do this, but at this point I don't think it will.

      Thanks for the link to public-qt-comments. I shall submit something there.

      BTW: extension functions for match and replace are of course easy to add. Reason I dind't go there is because I am running into this at the analyze-string instruction, and I'm not going to duplicate that entire instruction...

    • Done. public-qt-comment submitted. But isn't it already too late (last call is over?)

      • Michael Kay
        Michael Kay

        There will be a second last-call period, and I'm sure the WGs will consider this comment. There's quite a strong "no new features" lobby that will resist the changes, but the argument that it's less work for implementors to provide these features than to disable them may carry some weight.

        One of the obstacles to adoption will be the work involved in writing the specification: in the past we have found it very difficult to pick up off-the-shelf documentation of regex features of adequate quality. Perl tends to describe things very informally (sometimes even mentioning bugs in the implementation) and Java tends to reference Perl. If the WG writes a spec that turns out to be inconsistent with the existing implementations that does more harm than not doing it at all.

        Michael Kay

    • Michael Kay
      Michael Kay

      Just to expand on my comment on the "patches" page.

      I actually argued in the XSL WG that extensions to the regex syntax should be allowed, and I lost: the WG voted for interoperability - i.e. no extensions allowed. That might still be overturned (the XQuery WG places much lower weight on interop than the XSL WG) but at the moment, that's the way the specs stand.

      My policy has always been to exploit the extensibility features permitted by the spec to the full, and never to go beyond them. Saxon has a reputation for the highest level of conformance to the specs, and I intend to maintain that. I've no problem with a library of third-party extension functions that provide an enhanced regex syntax, but I'm not putting in a feature that makes Saxon non-conformant, even if the user has to switch it on explicitly.

      I'm also not going to integrate contributed code unless it comes with user documentation and test cases. I often spend longer writing the documentation and tests for a new feature than writing the code.

      If you feel there are features missing from the XPath regex syntax, it would be more constructive to raise comments on public-qt-comments asking for them to be added, and giving justification.

      Michael Kay