Well, this interface wasn't really designed as a public API, but rather to support the functionality of replace(), tokenize(), xsl:analyze-string etc. In particular, the regex.analyze() method is used to support xsl:analyze-string, and it returns the sequence of matching/non-matching substrings, with the isMatching() method of the iterator telling you whether you are currently processing a match or a non-match.

The matching or non-matching pieces are never "part of" a group, rather than matching pieces can be further subdivided into substrings (groups) to determine which parts of the piece match which parts of the regex.

The methods on ARegularExpression correspond directly to the matches/replace/tokenize/analyze-string functions in XSLT/XPath, and none of them seems to do precisely what you are looking for.

Michael Kay

On 21 Dec 2013, at 19:30, Florent Georges <lists@fgeorges.org> wrote:


  I try to use the new Regex package in Saxon 9.5.  But I don't really
understand the iterator model used by RegularExpression.analyze().
What I would like to achieve, is to match a value against an XPath
regex, and then get all the pieces from the matched value, based on
the groups in the regex.  For each "piece", I need the info "am I part
of a group?" and if yes, which one.  For instance:

    Regex: /foo/([a-z]+)/([0-9]+)
    Value: /foo/bar/1234

  Expected result:

    Piece: /foo/ (no group)
    Piece: bar   (group #1)
    Piece: /     (no group)
    Piece: 1234   (group #2)

  What I tried so far:

    public static void main(String[] args)
            throws Exception
        loop("/foo/bar/1234", "/foo/([a-z]+)/([0-9]+)");

    private static void loop(String value, String lexical)
            throws Exception
        RegularExpression regex =
            new ARegularExpression(lexical, "", "XP30", null);
        RegexIterator it = regex.analyze(value);

        StringValue s1;
        while ( (s1 = (StringValue) it.next()) != null ) {
            System.out.println("Value: " + s1);
            SequenceIterator groups = it.getRegexGroupIterator();

            StringValue s2;
            while ( (s2 = (StringValue) groups.next()) != null ) {
                System.out.println("   Item: " + s2);

  This output the following:

    Value: "/foo/bar/1234"
      Item: "bar"
      Item: "1234"

  So the first-level iterator "loops" over the entire string, as a
whole, and the second-level iterator loops over the "grouped values"
only.  But I am not sure how to cut the initial value into pieces
based on groups.

  Is that possible at all?

Florent Georges

Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
saxon-help mailing list archived at http://saxon.markmail.org/