Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Not producing text correctly from cmd line

Help
2011-09-02
2012-10-08
  • I'm trying to process an xml file using xquery and print the result as a text
    file with each returned value on one line. It seems that saxon is adding an
    extra space to each of my entries.

    Here is a minimal example xml file to illustrate this (Scores.xml):

    <?xml version="1.0"?>
    <Students>
    <Student>
    <FirstName>Bill</FirstName>
    <LastName>Jones</LastName>
    <Grade>B</Grade>
    </Student>
    <Student>
    <FirstName>James</FirstName>
    <LastName>Smith</LastName>
    <Grade>A</Grade>
    </Student>
    <Student>
    <FirstName>Sally</FirstName>
    <LastName>Masters</LastName>
    <Grade>F</Grade>
    </Student>
    </Students>
    

    And the query that I am running (query.q):

    declare option saxon:output "omit-xml-declaration=yes";
    declare option saxon:output "method=text";
    for $stud in doc("Scores.xml")/Students/Student
    order by $stud/LastName, $stud/FirstName
    return concat($stud/FirstName," ",$stud/LastName,"&#xA;")
    

    The output that I am getting is:

    Bill Jones
     Sally Masters
     James Smith
    (extra blank line here)
    

    I am using saxon 9.3he, java jdk 1.6.0_18 on Windows 7, and am running my
    query from the command line with

    java -cp saxon9he.jar net.sf.saxon.Query -q:query.q

    Without the new line added, the results show up as one line seperated by a
    space. If I add the newline before the name (which causes a blank line), I
    have confirmed that there is an extra space after the names. As far as I have
    been able to tell, saxon is printing item 1 followed by a space then item 2
    followed by a space then item 3 followed by a space. Thus my weird indention
    is caused by the extra space being added after the previous name, without any
    line feeds of its own.

    Wrapping the entire query in a string-join with the newline gives the correct
    result.

    Is this intended behavior, and how do I get the result that I want? I
    understand that saxon uses the document function on the output, but my
    understanding of the spec says that the text nodes should be concatenated
    without any extra spaces, and it looks to me like there is extra space being
    added, although there is a good chance that I am misunderstanding the spec.

    Any help would be appreciated.

     
  • David Lee
    David Lee
    2011-09-02

    That is correct. The standard serialization for sequences is to seperate items
    by a space.
    http://www.w3.org/TR/xslt-xquery-serialization/

    If you want to avoid that then produce a single string as your output not a
    sequence.
    like:

    string-join( (
    for $stud in doc("Scores.xml")/Students/Student
    order by $stud/LastName, $stud/FirstName
    return concat($stud/FirstName," ",$stud/LastName ) , " " )

     
  • Michael Kay
    Michael Kay
    2011-09-02

    As daldei says, this is correct according to the spec. Another workaround is
    to output text nodes:

    declare option saxon:output "omit-xml-declaration=yes";
    declare option saxon:output "method=text";
    for $stud in doc("Scores.xml")/Students/Student
    order by $stud/LastName, $stud/FirstName
    return (text{$stud/FirstName, $stud/LastName}, text{"&#xA;"})
    
     
  • Ok, I see that in the spec now (section 2 of the page that you listed). I had
    tried the sting-join approach and it did work, it just feels like that can't
    be the way this was intended to be done, but looking at that, it is.

    This was driving me nuts on trying to get around that extra space. Thank you
    so much for helping.

     
  • Thank you, Mr. Kay. This does work as well (although at the moment, I'm not as
    clear why - I'm going to have to stare at it a bit). Thank you both for the
    quick reply.

     
  • David Lee
    David Lee
    2011-09-02

    The reason text nodes work differently from strings is according to the XDM
    serialization specs, text nodes are concatenated whereas string (atomic values
    aka xs:string values) are separated by spaces.
    Its just the way it is.

    -David

     
  • I think my confusion was in understanding the difference between actual text-
    nodes and a bunch of text values (strings) in the document. I think that I can
    see what is going on now. I prefer the string-join approach (because I don't
    get an extra blank line), but thank you very much for your suggestion Mr. Kay,
    that is the one that helped me see what is going on.

     
  • Michael Kay
    Michael Kay
    2011-09-03

    Yes, the distinction between text nodes and strings is a very subtle one,
    whether you are using XQuery or XSLT, and when it comes to controlling
    whitespace it's an important distinction to understand.