Optimizer error in XQuery, Saxon 9.4.0.1N?

2012-02-19
2012-10-08
  • Phil Pfeiffer
    Phil Pfeiffer
    2012-02-19

    A student of mine and I are both experiencing a problem with a premature
    truncation of an output from an XQuery query. The following is my query:


    declare default element namespace "http://www.cs.etsu.edu/xquery-training-
    assignment/csci-faculty-staff-data/0.1
    ";
    declare copy-namespaces no-preserve, inherit;
    declare option saxon:output "omit-xml-declaration=yes";

    <csci-faculty-staff-data>
    {
    for $e in /csci-faculty-staff-data/faculty-staff-member
    return element faculty-staff-member {
    attribute last {data($e/name/last)},
    attribute first {data($e/name/last)},
    for $ce in ($e/child::* except $e/name) return $ce
    }
    }
    </csci-faculty-staff-data>


    We're running this query against a file of our department's faculty data:
    size, 67,020 characters; record count: 33 faculty/staff records.

    When we run this query against the file with saxon xquery's default
    optimization level, which I presume to be 10, Saxon cuts off the file's final
    record, like so:

    <preparations-by-semester>
    <semester>201210</semester>
    <preparations>
    <preparation><course-rubric>CSCI</course-r

    When I run the query with -opt:x for any value x in 0..9, the result is output
    properly, like so:

    <preparations-by-semester>
    <semester>201210</semester>
    <preparations>
    <preparation><course-rubric>CSCI</course-rubric><course-number>2200</course-><number-of-s<br>ections>2</number-of-sections></preparation>
    <preparation><course-rubric>CSCI</course-rubric><course-number>4127</course-></preparatio<br>n>
    <preparation><course-rubric>CSCI</course-rubric><course-number>5127</course-></preparatio<br>n>
    </preparations>
    </preparations-by-semester>
    </preparations-by-semesters></faculty-staff-member></csci-faculty-staff-data>

    Since the source file is 67,000+ bytes long, I've not attached it to this
    e-mail. I will, however, provide it on request if you e-mail me at
    phil@etsu.edu.

     
  • Michael Kay
    Michael Kay
    2012-02-19

    Thanks for reporting it.

    Please note that we're trying to migrate over to a new community site at
    http://dev.saxonica.com/redmine - you can
    raise suspected bugs directly in the bug tracker there.

    The symptoms look like an output file not being closed. So the question is,
    how are you running the query? Are you creating the output stream yourself,
    and if so, are you closing it? Do you get the same effect when running from
    the command line? Are you on Java or .NET?

    It's hard to see how this could be correlated with the optimization level of
    the query. That could be coincidence of some kind.

     
  • Michael Kay
    Michael Kay
    2012-02-19

    Ah, I see it's .NET. That helps focus the investigation. It's still important
    to know how you are running the query.

     
  • Phil Pfeiffer
    Phil Pfeiffer
    2012-02-20

    Thanks for responding and also the reference to
    http://dev.saxonica.com. I'll use that in the
    future, once this discussion is concluded.

    I agree with your assessment. It looks like an output file isn't being closed
    properly. The syndrome appears when I run the problem query from a Windows
    command prompt, using any of the following commands:
    xquery -s:faculty-staff-data.xml -q:temp.xql
    xquery -s:faculty-staff-data.xml -q:temp.xql >temp.txt
    xquery -opt:10 -s:faculty-staff-data.xml -q:temp.xql
    xquery -opt:10 -s:faculty-staff-data.xml -q:temp.xql >temp.txt

    Note that I've copied Saxon's query.exe executable to xquery.exe to avoid a
    naming conflict with a built-in MS command named query.

    While I'm not 100% sure of this, I believe that my student has been running
    the query using commands like
    java net.sf.saxon.Query -s:faculty-staff-data.xml -q:query.xql
    java net.sf.saxon.Query -s:faculty-staff-data.xml -q:query.xql >temp.xml

    I can check with him tomorrow in class.

    As for this being a coincidence, while it could well be, simply adding -opt:9
    allows the query to complete correctly: i.e.,
    xquery -opt:9 -s:faculty-staff-data.xml -q:temp.xql

    Also, if you'd like my faculty-staff-data.xml file, e-mail me at phil at
    etsu.edu, and I'll send it.

     
  • Phil Pfeiffer
    Phil Pfeiffer
    2012-02-23

    By way of thanks for your follow-up on my "spaces in filenames" issue, and in
    the aftermath of your request to try to isolate that other problem, I tried to
    do more tonight to isolate the file truncation syndrome described earlier in
    this thread. I first reduced my earlier query to the following query:

    declare default element namespace "http://www.cs.etsu.edu/xquery-training-
    assignment/csci-faculty-staff-data/0.1
    ";
    declare copy-namespaces no-preserve, inherit;
    <foo> { for $e in / return element bar {$e/} } </foo>

    With this query, the premature output file truncation issue-- again, under
    .NET-- still occurs when I execute

    xquery -s:faculty-staff-data -q:q.xml

    The error disappears, however, if I make any one of the following changes to
    this query and its method of invocation:

    -. add an -opt:9 option to the command line
    -. remove the copy-namespaces declaration from the query
    -. drop the "for" expression, reducing the inner content to element bar {*}
    -. remove the direct element constructor for <foo>, leaving just the for expression

    I also discovered that the truncation issue persists if I convert the 'for'
    expression to one that references the input document directly: i.e.,

    for $e in doc('faculty-staff-data.xml')/ return element bar {$e/}

    Apologies for posting this to sourceForge instead of dev.saxonica.com, but I
    can't access dev.saxonica.com right now-- "problem loading page" -- and I
    wanted to submit this data to you while I was thinking of it.