Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Difficulty with resolve-uri() with Saxon

Help
Lea Hayes
2009-05-24
2012-10-08
  • Lea Hayes
    Lea Hayes
    2009-05-24

    Hi,

    I have an unusual problem with resolve-uri(), and I was not sure if this was a bug with Saxon, or an error in my usage.

    The following line:
    <xsl:value-of select="resolve-uri('test.xml','file://C:\test\something')"/>
    renders the following, as expected:
    file:///C:/test/test.xml

    However, the following line:
    <xsl:value-of select="resolve-uri('test.xml','file://C:\test%20test\something')"/>
    causes the following error:
    Base URI {file://C:\test%20test\somethin...} is invalid: Illegal character in path at index 15: file:///C:/test test/test.xml

    Is this a problem with Saxon? or am I doing something wrong?

    Many thanks,
    Lea Hayes

     
    • Lea Hayes
      Lea Hayes
      2009-06-11

      I understand that you are very busy. I was wondering whether you had found the source of this issue.

      Given the following transform:

      <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">
      <xsl:template match="/">
      [<xsl:value-of select="resolve-uri('test.xml','file:/c:/test folder with spaces/')"/>]
      </xsl:template>
      </xsl:transform>

      Saxon generates the following error from the .NET API:
      Base URI {file:/c:/test folder with spac...} is invalid: Illegal character in path at index 15: file:///c:/test folder with spaces/test.xml

      But, Saxon generates the expected output from within the <oxygen/> XML Editor:
      [file:/c:/test%20folder%20with%20spaces/test.xml]

      From my understanding, <oxygen/> uses the Java version of the API. So like you previously thought, perhaps this issue is constrained to the .NET version of the Saxon API.

      I don't know if this is of any help.

      Many thanks,
      Lea Hayes

       
      • Michael Kay
        Michael Kay
        2009-06-16

        Sorry about the delay in responding to this.

        It turns out that the difference in behaviour is due to something Saxon does, not to the underlying platform: although Saxon uses different library routines on the two platforms, it escapes spaces as %20 before calling the relevant method on the Java platform, and fails to do the same on .NET.

        The code for Java has the rather unsatisfactory comment:

            // It's not entirely clear why we have to escape spaces by hand, and not other special characters;
            // it's just that tests with a variety of filenames show that this approach seems to work.
        

        The specification itself is a little unhelpful here, even as amended in erratum FO.E1:

        http://www.w3.org/XML/2007/qt-errata/xpath-functions-errata.html#E1

        The relevant rule is:

        If $base is not a valid URI according to the rules of the xs:anyURI data type, if it is not a suitable URI to use as input to the chosen resolution algorithm (for example, if it is a relative URI reference, if it is a non-hierarchic URI, or if it contains a fragment identifier), then an error is raised [err:FORG0002].

        I think there's a missing "or" after the first comma. Now, your input is what I call a "wannabe-URI": a string that becomes a valid URI after special characters are escaped. As such, it's a valid URI according to the rules of the xs:anyURI data type, but it is not a valid URI according to the RFCs that define the URI resolution algorithm; and there appears to be no license in the spec to do what Saxon on Java is doing, namely escaping the URI to make it valid. So, regretfully, I think Saxon on Java has it wrong, and it's right to throw an error on .NET. You should be escaping the URI before attempting to resolve it, using the iri-to-uri() function.

        Surprisingly, however, if I try this, I get a new problem:

        FORG0002: Base URI {file:/c:/test%20dir/} is invalid: Illegal character in path at index
        15: file:///c:/test dir/test.xml

        This seems to be because Saxon is taking the .NET System.Uri returned by the XmlUrlResolver, applying ToString() on it, and then passing the result to the Java java.net.URI constructor; it appears that the ToString() method unescapes the %20, making a wannabe-URI that is unacceptable to the Java constructor. So the .NET code is wrong too.

        I'm going to experiment with using the same (Java) code on both platforms. I suspect the reason it diverged was due to bugs in the GNU ClassPath library that are no longer present.

         
        • Lea Hayes
          Lea Hayes
          2009-06-16

          > Sorry about the delay in responding to this.

          No problem.

          > This seems to be because Saxon is taking the .NET System.Uri returned by the XmlUrlResolver, applying ToString() on it, and then passing the result to the Java java.net.URI constructor; it appears that the ToString() method unescapes the %20, making a wannabe-URI that is unacceptable to the Java constructor. So the .NET code is wrong too.

          I don't understand the internal workings of Saxon; but would simply switching from "theUri.ToString()" to "theUri.AbsoluteUri" solve the problem? This version of the URI maintains the escaping.

           
          • Michael Kay
            Michael Kay
            2009-06-16

            Thanks for the suggestion. That may be a less risky fix for a maintenance release.

             
    • Michael Kay
      Michael Kay
      2009-05-24

      Is this on Java or .NET? Saxon in both cases uses the URI library of the underlying platform.

      I'm surprised it works with backslashes - though .NET in particular is very tolerant of things that aren't legal according to the RFC. But the combination of backslash and %-escaping is asking a bit much.

      Legal URIs only use forwards slash as a path separator.

       
    • Lea Hayes
      Lea Hayes
      2009-05-24

      I am using the .NET library, but this problem also seems to occur within the <oxygen/> XML editor.

      In my actual transform the troublesome URI is not specified in the XML or the XSL. It is the default base URI:

      <xsl:value-of select="resolve-uri('test.xml',base-uri(.))"/>

      The default base URI uses forward slashes, but it is using %20 for spaces. Forward/backward slashes do not appear to make a lot of difference in my previous test.

      The following URI works, but this does not help me because I need to use the base-uri(.) function:

      <xsl:value-of select="resolve-uri('test.xml','file://C:/testte~1/something')"/>

       
      • Michael Kay
        Michael Kay
        2009-05-24

        How do you invoke the transformation? The value returned by base-uri(.) comes from somewhere - it might be set explicitly via the API or be obtained from a filename, etc.

         
    • Lea Hayes
      Lea Hayes
      2009-05-24

      The base-uri function is used in two different context's:

      1 - From the input document which is transformed using the API:

      2 - From within the context of another document which is opened using the doc() function on xlink:href attributes.

      Here is the code that I am using to invoke the API:

      XdmNode input = processor.NewDocumentBuilder().Build(new Uri(sourceUri));

      XsltTransformer transformer = processor.NewXsltCompiler().Compile(new Uri(xsltUri)).Load();

      transformer.InitialContextNode = input;
      transformer.BaseOutputUri = new Uri(sourceUri);

      Is URI escaping completely incompatible with the resolve-uri function?

      The following quote suggests that XSLT should support this (http://www.xsltfunctions.com/xsl/fn_doc.html):

      "If you are accessing documents on a file system, your implementation may require
      you to precede the file name with file:///, use forward slashes to separate directory
      names, and escape each space in the file name with %20."

       
      • Michael Kay
        Michael Kay
        2009-05-24

        So what's the value of sourceUri and xsltUri respectively?

        I think that resolve-uri() should accept a URI that has been percent-encoded. If it isn't accepting it, then I need to investigate why. It's currently doing it via a call on XmlUrlResolver.ResolverUri(). There's a comment in the code that suggests there's no really good reason for doing it differently on the .NET and Java platforms - on Java it's done using the resolve() method of class java.net.URI.

         
    • Lea Hayes
      Lea Hayes
      2009-05-24

      String sourceUri = "C:\Users\Administrator\Documents\doc\test.xml";
      String xsltUri = "C:\Users\Administrator\Documents\doc\test.xsl";

      If the following line is added:
      Uri testSourceUri = new Uri(sourceUri);

      Then in debug mode, the following is true:
      testSourceUri.AbsoluteUri == "file:///C:/Users/Administrator/Documents/doc/test.xml"

      If I set sourceUri to another test document:
      String sourceUri = "C:\Users\Administrator\Documents\doc\another folder\test.xml";

      Then in debug mode, the following is true:
      testSourceUri.AbsoluteUri == "file:///C:/Users/Administrator/Documents/doc/another%20folder/test.xml"

      But, the error is reported even where the base URI is explicitly specified:

      resolve-uri('test.xml','file://C:/test%20test/something')