I'm trying to process an xml file using xquery and print the result as a text
file with each returned value on one line. It seems that saxon is adding an
extra space to each of my entries.
Here is a minimal example xml file to illustrate this (Scores.xml):
Without the new line added, the results show up as one line seperated by a
space. If I add the newline before the name (which causes a blank line), I
have confirmed that there is an extra space after the names. As far as I have
been able to tell, saxon is printing item 1 followed by a space then item 2
followed by a space then item 3 followed by a space. Thus my weird indention
is caused by the extra space being added after the previous name, without any
line feeds of its own.
Wrapping the entire query in a string-join with the newline gives the correct
result.
Is this intended behavior, and how do I get the result that I want? I
understand that saxon uses the document function on the output, but my
understanding of the spec says that the text nodes should be concatenated
without any extra spaces, and it looks to me like there is extra space being
added, although there is a good chance that I am misunderstanding the spec.
Any help would be appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you want to avoid that then produce a single string as your output not a
sequence.
like:
string-join( (
for $stud in doc("Scores.xml")/Students/Student
order by $stud/LastName, $stud/FirstName
return concat($stud/FirstName," ",$stud/LastName ) , "
" )
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, I see that in the spec now (section 2 of the page that you listed). I had
tried the sting-join approach and it did work, it just feels like that can't
be the way this was intended to be done, but looking at that, it is.
This was driving me nuts on trying to get around that extra space. Thank you
so much for helping.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you, Mr. Kay. This does work as well (although at the moment, I'm not as
clear why - I'm going to have to stare at it a bit). Thank you both for the
quick reply.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The reason text nodes work differently from strings is according to the XDM
serialization specs, text nodes are concatenated whereas string (atomic values
aka xs:string values) are separated by spaces.
Its just the way it is.
-David
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think my confusion was in understanding the difference between actual text-
nodes and a bunch of text values (strings) in the document. I think that I can
see what is going on now. I prefer the string-join approach (because I don't
get an extra blank line), but thank you very much for your suggestion Mr. Kay,
that is the one that helped me see what is going on.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the distinction between text nodes and strings is a very subtle one,
whether you are using XQuery or XSLT, and when it comes to controlling
whitespace it's an important distinction to understand.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm trying to process an xml file using xquery and print the result as a text
file with each returned value on one line. It seems that saxon is adding an
extra space to each of my entries.
Here is a minimal example xml file to illustrate this (Scores.xml):
And the query that I am running (query.q):
The output that I am getting is:
I am using saxon 9.3he, java jdk 1.6.0_18 on Windows 7, and am running my
query from the command line with
java -cp saxon9he.jar net.sf.saxon.Query -q:query.q
Without the new line added, the results show up as one line seperated by a
space. If I add the newline before the name (which causes a blank line), I
have confirmed that there is an extra space after the names. As far as I have
been able to tell, saxon is printing item 1 followed by a space then item 2
followed by a space then item 3 followed by a space. Thus my weird indention
is caused by the extra space being added after the previous name, without any
line feeds of its own.
Wrapping the entire query in a string-join with the newline gives the correct
result.
Is this intended behavior, and how do I get the result that I want? I
understand that saxon uses the document function on the output, but my
understanding of the spec says that the text nodes should be concatenated
without any extra spaces, and it looks to me like there is extra space being
added, although there is a good chance that I am misunderstanding the spec.
Any help would be appreciated.
That is correct. The standard serialization for sequences is to seperate items
by a space.
http://www.w3.org/TR/xslt-xquery-serialization/
If you want to avoid that then produce a single string as your output not a
sequence.
like:
string-join( (
for $stud in doc("Scores.xml")/Students/Student
order by $stud/LastName, $stud/FirstName
return concat($stud/FirstName," ",$stud/LastName ) , " " )
As daldei says, this is correct according to the spec. Another workaround is
to output text nodes:
Ok, I see that in the spec now (section 2 of the page that you listed). I had
tried the sting-join approach and it did work, it just feels like that can't
be the way this was intended to be done, but looking at that, it is.
This was driving me nuts on trying to get around that extra space. Thank you
so much for helping.
Thank you, Mr. Kay. This does work as well (although at the moment, I'm not as
clear why - I'm going to have to stare at it a bit). Thank you both for the
quick reply.
The reason text nodes work differently from strings is according to the XDM
serialization specs, text nodes are concatenated whereas string (atomic values
aka xs:string values) are separated by spaces.
Its just the way it is.
-David
I think my confusion was in understanding the difference between actual text-
nodes and a bunch of text values (strings) in the document. I think that I can
see what is going on now. I prefer the string-join approach (because I don't
get an extra blank line), but thank you very much for your suggestion Mr. Kay,
that is the one that helped me see what is going on.
Yes, the distinction between text nodes and strings is a very subtle one,
whether you are using XQuery or XSLT, and when it comes to controlling
whitespace it's an important distinction to understand.