Possible bug saxonb 9-1-0-7 Java

Help
pvallone
2010-09-16
2012-10-08
  • pvallone
    pvallone
    2010-09-16

    Hi, I have a recursive template and function that analyzes a string and if it
    meets certain conditions it makes the word title case. For example:

    "THE BOY AND THE MOON (BATM) car for to the END"

    would return:

    "The Boy And The Moon (BATM) Car for to The End"

    My Templates work in XMLSpy but return nothing in Saxon 9-1-0-7

    Here is my input:

    <?xml version="1.0" encoding="UTF-8"?>
    <root>
        <data>THE BOY AND THE MOON (BATM) car for to the END</data>
    </root>
    

    Here is my XSLT:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="[url]http://www.w3.org/1999/XSL/Transform[/url]" xmlns:xs="[url]http://www.w3.org/2001/XMLSchema[/url]" xmlns:fn="[url]http://www.w3.org/2005/xpath-functions[/url]" xmlns:mf="myfunct" exclude-result-prefixes="mf fn xs" >
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
        <xsl:template match="/">
            <xsl:for-each select="/root/data">
                <return>
                    <xsl:call-template name="titlecase">
                        <xsl:with-param name="text" select="."/>
                    </xsl:call-template>
                </return>
            </xsl:for-each>
        </xsl:template>
        <xsl:template name="titlecase">
            <xsl:param name="text" as="xs:string"/>
            <xsl:for-each select="tokenize($text, '[\W\w+]')">
                <xsl:analyze-string select="." regex="([(][A-Z]+[)])">
                    <xsl:matching-substring>
                        <xsl:value-of select="."/>
                    </xsl:matching-substring>
                    <xsl:non-matching-substring>
                        <xsl:value-of select="replace(concat(mf:titlecase(., ('and', 'or', 'for', 'to')), ' '), '  ', ' ')"/>
                    </xsl:non-matching-substring>
                </xsl:analyze-string>
            </xsl:for-each>
        </xsl:template>
        <xsl:function name="mf:titlecase" as="xs:string">
            <xsl:param name="text" as="xs:string"/>
            <xsl:param name="ignore-list" as="xs:string*"/>
            <xsl:value-of>
                <xsl:analyze-string select="$text" regex="\w+">
                    <xsl:matching-substring>
                        <xsl:analyze-string select="." regex="{string-join($ignore-list, '|')}">
                            <xsl:matching-substring>
                                <xsl:value-of select="."/>
                            </xsl:matching-substring>
                            <xsl:non-matching-substring>
                                <xsl:sequence select="concat(upper-case(substring(., 1, 1)), lower-case(substring(., 2)))"/>
                            </xsl:non-matching-substring>
                        </xsl:analyze-string>
                    </xsl:matching-substring>
                    <xsl:non-matching-substring>
                        <xsl:value-of select="."/>
                    </xsl:non-matching-substring>
                </xsl:analyze-string>
            </xsl:value-of>
        </xsl:function>
    </xsl:stylesheet>
    

    Thoughts?

     
  • pvallone
    pvallone
    2010-09-16

    After further investigating, I do not believe this to be a bug, but rather how
    each processor handles regex.
    When I changed the following regex, it worked.

    <xsl:for-each select="tokenize($text, '[\W+][(][)]')">
    
     
  • Michael Kay
    Michael Kay
    2010-09-16

    tokenize($text, '[\W\w+]')

    is clearly nonsense: the regular expression matches any character that is
    either a word character or a non-word character or a "+" sign; that is, it
    matches any character, and therefore tokenize() applied to a string of 5
    characters returns a sequence of 6 zero-length strings. Hence the lack of any
    output.

    tokenize($text, '[\W+][(][)]')">
    

    doesn't seem much better. The regex matches any sequence consisting of a
    sequence of non-word characters followed by "(" followed by ")". No such
    sequence occurs in your input, so the tokenize() will return a single token
    equal to the input string.

    I can't see why you're trying to process the string through three separate
    regular expressions. You seem to be making it vastly more complicated than it
    needs to be.

     
  • Michael Kay
    Michael Kay
    2010-09-16

    PS: neither your template nor your function is recursive.

     
  • pvallone
    pvallone
    2010-09-17

    Note to self: you've been schooled!

    Thanks Michael - if

    tokenize($text, '[\W+][(][)]')">
    

    doesnt make much sense, then how can I match a word that is in parentheses?

    e.g. (BATM) or (XSLT)

    Thx

     
  • gertone
    gertone
    2010-09-17

    This really has become a question for mulberry tech XSL list, not here

    You seem to miss the concept of what tokenize() does.
    Tokenize() splits a string into a sequence (put simply), the regex in teh
    second parameter is the seperator description (note that you loose the
    seperator)

    I have rewritten your code as I think you need it.
    It is not rocket science code, but I think that the way I put it would help
    you understand the use of tokenize better

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="2.0" xmlns:xsl="[url]http://www.w3.org/1999/XSL/Transform[/url]"
        xmlns:xs="[url]http://www.w3.org/2001/XMLSchema[/url]"
        xmlns:fn="[url]http://www.w3.org/2005/xpath-functions[/url]"
        xmlns:mf="myfunct" exclude-result-prefixes="mf fn xs" > 
        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
        <xsl:variable name="ignore-list" select="('to', 'for', 'on', 'the', 'if', 'when', 'a', 'an', 'and')"></xsl:variable>
        <xsl:template match="/">
            <xsl:for-each select="/root/data">
                <return>
                    <xsl:analyze-string select="/root/data" regex="(\(\w+\))">
                        <xsl:matching-substring>
                            <xsl:value-of select="regex-group(1)"/>
                        </xsl:matching-substring>
                        <xsl:non-matching-substring>
                            <xsl:for-each select="tokenize(., '\s+')">
                                <xsl:choose>
                                    <xsl:when test="lower-case(.) = $ignore-list and not(position() = 1)">
                                        <xsl:value-of select="lower-case(.)"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:value-of select="upper-case(substring(.,1,1))"/>
                                        <xsl:value-of select="lower-case(substring(.,2))"/>
                                    </xsl:otherwise>
                                </xsl:choose>
                                <xsl:text> </xsl:text>
                            </xsl:for-each>
                        </xsl:non-matching-substring>
                    </xsl:analyze-string>
                </return>
            </xsl:for-each>
        </xsl:template>
    </xsl:stylesheet>
    
     
  • gertone
    gertone
    2010-09-17

    please remove the following from my code
    <xsl:for-each select="/root/data">
    it is legecay from your code, but it doesn't do anything here

     
  • pvallone
    pvallone
    2010-09-17

    Thanks. I appreciate the help.

    Regards,

    Phil