Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Bug with Operator for Date Subtraction (...

Help
smootz
2011-01-28
2012-10-08
  • smootz
    smootz
    2011-01-28

    In both the HE and EE versions of saxon, using the subtraction operator ('-')
    on two dates results in the following error message:

    XPTY0004: An error occurred matching pattern {pattern matching node()}:
    Arithmetic operator is not defined for arguments of types (xs:double, xs:date)

    The template matching predicate that is generating this error is as follows:

    someNode
    (Note that @date is of type xs:date)

    In the EE version (which was only downloaded and used with the 30 day trial
    license to verify the same behavior existed), I attempted to enforce a schema-
    aware transformation by using the -sa flag, but that produced the same
    results. Interestingly enough, however, using the -val option (with either lax
    or strict) produces the anticipated results...that is, a duration is returned
    to the days-from-duration function, which results in the integer value needed
    for comparison.

    The point is that this should not be something that works as expected when
    schema-awareness is turned on; rather, it should work as anticipated without
    schema-awareness.

     
  • Michael Kay
    Michael Kay
    2011-01-28

    In the absence of a schema, the typed value of an attribute is always
    xs:untypedAtomic, regardless of the fact that its value might look like a date
    and its name might be "date". In a subtraction, an xs:untypedAtomic value is
    always treated as an xs:double (again, regardless of its lexical form, and
    regardless of the type of the other operand). Sorry, but that's what the spec
    says, and that's what Saxon does. You have two choices: create a typed
    document by validating against a schema, or convert the attribute to a date by
    hand (xs:date(@date))/

     
  • smootz
    smootz
    2011-02-15

    Understood...thank you for the explanation, it is greatly appreciated. I just
    wanted to follow up with you regarding one aspect from my original post that
    you didn't comment on (the -sa flag vs. the -val flag).

    I would expect the -sa flag to treat the value as an xs:date (because it's
    enforcing schema-awareness). This is not the case, as I am still receiving the
    XPTY0004 message. However, when using the -val flag (either strict or lax),
    which automatically switches on the -sa option, the transformation works as
    expected.

    So, what's the difference between these two flags, and why are things working
    as expected using -val but not by using -sa?

     
  • Michael Kay
    Michael Kay
    2011-02-15

    Historically, the -sa flag meant "Use Saxon-SA, ensure it is licensed, and
    enable its functionality". The meaning has shifted somewhat; it now means
    "compile this query or stylesheet so that it is capable of dealing with typed
    input documents as well as untyped documents". It does not actually say
    "validate the primary input document". Perhaps it would be better if it did,
    but I'm reluctant to make that change. (It's reasonable to validate some input
    documents and not others. It's possible to control which documents are
    validated by using xsl:type and xsl:validate within the stylesheet itself. For
    example, the stylesheet might have a preprocessing phase that it has to run
    before doing validation.)

    The -val flag means "validate all input documents". This automatically
    switches on -sa, since if input documents are going to be typed, then the
    query/stylesheet had better be compiled to deal with this.

    So -sa means that the query/stylesheet can handle typed data; -val says tha
    the primary input, and documents read using doc(), should be validated.
    They're subtly different.

     
  • smootz
    smootz
    2011-07-06

    Since the time of my original post, we have purchased licensing for the EE
    version of Saxon and have come across another issue (maybe two) that are semi-
    related to this issue.

    Since we are operating using the schema awareness flag (-sa), I would expect
    that the stylesheets would be able to determine the type of a value from the
    schema. However, this does not appear to be the case (at least when using the
    abbreviated comparison operators).

    Let me provide a quick example/test case.

    Let's say I have an element, TempRange, with attributes min and max, both
    defined in mySchema as a simpleType with a decimal restriction base
    (totalDigits is 6, fractionDigits is 2), and have the following XML:

    ...
    <TempRange @min="25" @max="30.01"/>
    ...
    

    When attempting the following comparison in a template matching rule, the
    template is not triggered:

    mySchema:TempRange

    However, all else being equal, the following template matching rules are both
    triggered:

    mySchema:TempRange
    mySchema:TempRange

    So I suppose that I can ask my question in two ways (or perhaps they are
    separate questions):

    1. What's the difference between 'gt' and '>' that's causing the latter to compare correctly using the schema types that are defined and the former to require casting the attributes to xs:decimal explicitly?

    2. Why doesn't operating with the schema awareness flag make the stylesheets implicitly use the type definitions from the schema? (of course, this question assumes that there are no bugs with the abbreviated comparison operators - i.e., if the answer to #1 is that there's a bug with 'gt' because it should behave the same as '>' and use the schema types when the -sa flag is present, then this question becomes null and void)

    As always, thank you for your time and expertise :-)

     
  • Michael Kay
    Michael Kay
    2011-07-06

    1. What's the difference between 'gt' and '>' that's causing the latter to compare correctly

    Answer: the > operator converts an untyped atomic operand to the type of
    the other operand, whereas gt treates untyped atomic as string. This is strong
    evidence that your input attribute is labelled as untypedAtomic, meaning that
    the source document has not been validated. Did you specify -val on the
    command line?

    1. Why doesn't operating with the schema awareness flag make the stylesheets implicitly use the type definitions from the schema?

    Answer: I think I explained this in my earlier replies. The -sa option forces
    the stylesheet to be compiled with schema-awareness, but it does not force
    validation of input documents; for that you need the -val flag. (there are
    cases where you want a schema-aware stylesheet to operate on unvalidated
    input, for example where the purpose of the stylesheet is to convert an
    invalid document to a valid one.)

     
  • smootz
    smootz
    2011-07-07

    I guess what I'm not understanding is why a document needs to be validated in
    order for the processor to treat the types as they are defined in the schema
    when schema awareness is being specified?

     
  • smootz
    smootz
    2011-07-07

    I think I found the explanation I needed in your documentation at http://www.
    saxonica.com/documentation/expressions/comparisons.xml
    .

    Specifically, the comment "...Saxon currently uses its string value in the
    comparison, not its typed value as required by the XPath 2.0 specification."

    Are there any future plans to resolve this issue and make Saxon compliant with
    the XPath specification in this regard?

     
  • Michael Kay
    Michael Kay
    2011-07-07

    1. I guess what I'm not understanding is why a document needs to be validated in order for the processor to treat the types as they are defined in the schema when schema awareness is being specified?

    Because validation is the process that associates nodes in the document with
    declarations in the schema. Without validation, Saxon has no idea whether
    there is any relationship between an attribute called "date" in your source
    document and a declaration of an attribute called "date" in your schema.
    Remember your schema might define lots of attributes called "date", all with
    different types...

    1. I think I found the explanation I needed in your documentation...

    No, sorry, that sentence is long obsolete (now fixed). Saxon is 100%
    conformant with the specs in this area.

     
  • smootz
    smootz
    2011-07-07

    Can you point me to a specification reference that details this should be the
    behavior of an XPath 2.0 compliant processor so that I may review it with my
    colleagues (either where it is documented that 'gt' and '>' should have
    different behaviors, or where it is documented that a schema aware processor
    should not use the type definitions of the referenced schema)?

    According to the specification documentation I've reviewed, it looks like
    Saxon should be using type promotion and subtype substitution to perform the
    appropriate comparison when using 'gt' with two numeric types (as it does when
    using '>').

    For reference, here's the spec documentation that I've reviewed:

    6.3 Comparison Operators on Numeric Values (http://www.w3.org/TR/2007/REC-
    xpath-functions-20070123/#comp.numeric
    ) states that "if the arguments are of
    different types, one argument is promoted to the type of the other as
    described above in 6.2 Operators on Numeric Values."

    So following the reference to section 6.2 (http://www.w3.org/TR/2007/REC-
    xpath-functions-20070123/#op.numeric
    ), we find the statement "if the two operands
    are not of the same type, subtype substitution and numeric type promotion are
    used to obtain two operands of the same type...Section B.1 Type Promotion and
    Section B.2 Operator Mapping describe the semantics of these operations in
    detail."

    Following the reference to section B.1, "type promotion is used in evaluating
    function calls and operators that accept numeric or string operands (see B.2
    Operator Mapping)."

    ...and following the reference to B.2 ([http://www.w3.org/TR/xpath20/#mapping]
    (http://www.w3.org/TR/xpath20/%23mapping))...
    "A numeric operator may be validly applied to an operand of type AT if type AT
    can be converted to any of the four numeric types by a combination of type
    promotion and subtype substitution. If the result type of an operator is
    listed as numeric, it means "the first type in the ordered list (xs:integer,
    xs:decimal, xs:float, xs:double) into which all operands can be converted by
    subtype substitution and type promotion." As an example, suppose that the type
    hatsize is derived from xs:integer and the type shoesize is derived from
    xs:float. Then if the + operator is invoked with operands of type hatsize and
    shoesize, it returns a result of type xs:float. Similarly, if + is invoked
    with two operands of type hatsize it returns a result of type xs:integer."

    ...along with the following table snippet of interest...

    Operator Type(A) Type(B) Function Result type
    ... ... ... ... ...
    A gt B numeric numeric op:numeric-greater-than(A, B) xs:boolean
    ... ... ... ... ...

    I realize that the last section states that "if the result type of an operator
    is listed as numeric, it means the first type in the ordered list (xs:integer,
    xs:decimal, xs:float, xs:double) into which all operands can be converted by
    subtype substitution and type promotion" and the table clearly shows the
    result type as a boolean, but are you telling me that the same rules of type
    promotion and subtype substitution no longer apply when comparison operators
    are used?

     
  • Michael Kay
    Michael Kay
    2011-07-07

    The gt operator is a ValueComparison, so the rules are here:
    http://www.w3.org/TR/xpath20/#id-value-
    comparisons

    The > operator is a GeneralComparison, so the rules are here:
    http://www.w3.org/TR/xpath20/#id-general-
    comparisons

    Both steps perform Atomization, which gets the typed value of the node. The
    typed value depends (in spec-speak) on whether the XDM instance was
    constructed from an infoset or a PSVI (in real language, whether the source
    document was validated or not): it will be untypedAtomic in the first case,
    numeric (or whatever) in the second. The details of this are in the XDM data
    model specification.

    Rule 4 of ValueComparisons says "If the atomized operand is of type
    xs:untypedAtomic, it is cast to xs:string."

    While Rule 2b of General Comparisons (with 1.0 mode off) says "If exactly one
    of the atomic values is an instance of xs:untypedAtomic, it is cast to a type
    depending on the other value's dynamic type T according to the following
    rules..."

     
  • smootz
    smootz
    2011-07-07

    I appreciate the references; however, that still seems a little hokey (the
    spec itself, not Saxon)...is there at least a good/logical reason for having
    differing implementations for gt and >? I get that they're different now,
    but I'm having a hard time understanding why there's such a difference.

     
  • Michael Kay
    Michael Kay
    2011-07-07

    I'm having a hard time understanding why there's such a difference.

    XPath 1.0 defined the "<" family of operators. It was designed for handling
    unstructured untyped documents for use in the same kind of environment as
    Javascript, so the philosophy was dynamic typing, avoiding run-time errors,
    and generally trying to do the right thing in the face of unpredictable and
    possibly invalid data.

    The XQuery came along with its background in databases and query languages,
    which is an environment where static analysis and optimization is all-
    important, and hence strict/static typing. The folks from this culture looked
    at the XPath comparison operators with horror because it's very hard to
    support them well with indexes - and so they invented a new set.

    In practice of course, people use the operators like "=" and so implementors
    have to find a way of making them work. Using indexes to support "=" with its
    strange dynamic semantics is a tough challenge, but it's certainly possible
    and in my view adding the second set of operators was a mistake - but
    committees make lots of mistakes.

     
  • smootz
    smootz
    2011-07-08

    Okay, that clears it up for me.

    I appreciate the time you've taken to provide me with the background
    information I needed on these issues.

    Thank you.