Hello,
We found that VTD-XML behaved differently when we converted a node-set to a number.
This might be fixed by the recent patch, we do not use the git for testing.
We used the following document for testing.
<?xml version="1.0"?> <bookstore> <book-id>123</book-id> <book-id>456</book-id> <book-id>789</book-id> <book-id>234</book-id> </bookstore>
We executed the XPath expression number(/bookstore)
on the libraries leading to the results shown below:
Testing library: nokogiri , xpath: number(/bookstore) NaN Testing library: rexml , xpath: number(/bookstore) 123.0 Testing library: xalan-j , xpath: number(/bookstore) NaN Testing library: jaxen , xpath: number(/bookstore) NaN Testing library: lxml , xpath: number(/bookstore) nan Testing library: VTD-XML , xpath: number(/bookstore) 1.23456789234E11
The results of VTD-XML (and REXML) differ significantly from the other outputs.
The relevant portion of the XPath 1.0 standard on how a node-set is converted to a number is the following:
a node-set is first converted to a string as if by a call to the string function and then converted in the same way as a string argument. (Chapter 4.4 Number Functions)
So we could rewrite the XPath expression to number(string(/bookstore))
. This leads to the same results as before, which is right.
Important to note here is, that white-space in an XML Document is turned into a text-node children element.
The string value of an element node is defined like this in XPath 1.0:
The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.[Chapter 5.2 Element Nodes]
So when we execute the XPath expression string(/bookstore)
the result should contain with white-space as well. VTD-XML strips the white-space, which is non-standard behavior.
Executing just the string(node-set)
function leads to the following results:
lxml, nokogiri, rexml, xalan-j, jaxen, xpath: string(/bookstore) 123 456 789 234 library: VTD-XML , xpath: string(/bookstore) 123456789234
When VTD-XML now calls the number function with the wrong string value like this number('123456789234')
, it results in the wrong final result of 123456789234
.
Thank you for your time and we hope this is helpful.