Thanks for this information. This suggests that I should change the implementation of contains() to use Java's regular expression handling code. It currently uses
s0.indexOf(s1) >= 0
Did you take precautions to prevent the expressions being pre-evaluated at compile time? The best way to ensure this is to define the arguments using <xsl:param>, or of course to read them from the source document.
fn:current-time() is defined to return a time during the execution of the transformation. Successive calls in the same transformation return the same result. Yes, this makes it useless for performance work.
Michael Kay

From: [] On Behalf Of Andre Cusson
Sent: 20 July 2004 23:42
Subject: RE: [saxon] contains vs matches


Thank you for the info.

I had a bit of time to write a small performance comparison test and on 5 MB xml file, testing every string of every node and attribute, alternatively, and it seems that fn:matches is consistently about 10% faster than fn:contains when its matching parameter is a simple string ex: 'aaa'.  The results evened out a bit (ex: +- 5%) when I changed the matches expression to 'a{3}', but remained in favor of fn:matches.  One surprising result was that when I replaced that fn:matches matching expression to '[abc]', the performance increased for fn:matches to about 33% better than the fn:contains(., 'aaa'). version.

It also seems that I will replace those fn:contains by corresponding fn:matches, in my applications except when collations are used or maybe if characters need to be escaped.

Thank you.

PS: I am puzzled by the fact that fn:current-time() returns startup (compile) time and not current (run) time and I had to resort to using java:date() for the test stylesheet :

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
        exclude-result-prefixes="xsl xs xfn Date">

        <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

        <saxon:script language="java" implements-prefix="Date" src="java:java.util.Date" xmlns:Date="java:java.util.Date"/>

        <xsl:variable name="source" select="document('content.xml')/*"/>
<xsl:template match="/">
                <xsl:message>start :<xsl:value-of select="Date:toString(Date:new())"/></xsl:message>
                        <xsl:apply-templates select="$source" mode="contains"/>
                <xsl:message>mid :<xsl:value-of select="Date:toString(Date:new())"/></xsl:message>
                        <xsl:apply-templates select="$source" mode="matches"/>
                <xsl:message>end : <xsl:value-of select="Date:toString(Date:new())"/></xsl:message>
<xsl:template match="node()" mode="contains">
                <xsl:if test="contains(., 'aaa')"/>
                <xsl:apply-templates select="@*" mode="contains"/>
                <xsl:apply-templates mode="contains"/>
<xsl:template match="node()" mode="matches">
                <xsl:if test="matches(., 'aaa')"/>
                <xsl:apply-templates select="@*" mode="matches"/>
                <xsl:apply-templates mode="matches"/>

At 05:51 PM 7/15/2004, you wrote:
I think you can only get the answer to the performance question by
measurement. I simply don't know. I think that if the regex used in
matches() is known at compile time, it should be pretty efficient (Saxon
precompiles the regex in this case).

The only thing that contains() allows that matches() doesn't is the use of
collations. But collations with substring matching are pretty messy anyway.

I suspect if contains() hadn't already been there in XPath 1.0 it wouldn't
have been included in 2.0.

Michael Kay

> -----Original Message-----
> From:
> [] On Behalf Of
> Andre Cusson
> Sent: 14 July 2004 22:37
> To:
> Subject: [saxon] contains vs matches
> Hi,
> I am trying to understand the logic behind the use of 2 XPath
> functions :
> contains and matches
> If I understand well they are the same except that matches
> also matches
> regular expression (of which literal strings are a subset, apart from
> possibly having to escape special characters).  I am then inclined to
> thinking that either fn:contains is much faster than
> fn:matches (with a
> similar literal string matching expression) and should be
> preferred for
> performance reasons, or fn:matches should be used, most of
> the time, unless
> one is trying to match special characters and prefers not
> having to escape
> them.
> Could anyone confirm or correct my assumptions.
> To tell the truth, I am hoping that fn:contains is much faster than
> fn:matches, but if it is, my next question will then be: why can't
> fn:matches be so optimized ?
> Aren't there simple ways for matches to know if special
> characters in the
> matching expression should be escaped or not, so that only
> one matching
> function is required ?
> How much faster is fn:contains vs fn:matches
> Thank you
> Andre
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> _______________________________________________
> saxon-help mailing list

This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
saxon-help mailing list