xsd:any is greedy

  • Omar

    Omar - 2013-07-25


    I am having trouble when parsing documents whose schemas have xsd:any elements in a sequence. Here is an example:

    <?xml version="1.0" encoding="utf-8"?>
    <s:schema elementFormDefault="qualified" targetNamespace="testNs" xmlns:tns="testNs" xmlns:s="http://www.w3.org/2001/XMLSchema">
          <s:complexType name="MyObjectType">
              <s:any minOccurs="0" maxOccurs="unbounded" />
              <s:element minOccurs="0" maxOccurs="1" name="Name" type="s:string" />
          <s:element name="obj" type="tns:MyObjectType"/>

    Notice that s:any occurs before Name element.
    Now, when I try to run the following code

    xml = """<obj xmlns="testNs">
    my_obj = ws.CreateFromDocument(xml)
    print my_obj.Name
    print my_obj.wildcardElements()

    the result is


    I would expect that the parser recognizes the Name element and parses it correctly.
    I don't know if this is a bug or not, but is it somehow possible to get the desired behaviour?


  • Peter A. Bigot

    Peter A. Bigot - 2013-07-25

    Correct your schema. 3.10.2 XML Representation of Wildcard Schema Components:

    Wildcards are subject to the same ambiguity constraints (Unique Particle
    Attribution (§3.8.6)) as other content model particles: If an instance
    element could match either an explicit particle and a wildcard, or one of
    two wildcards, within the content model of a type, that model is in error.

    The Name element matches the wildcard, so the wildcard can consume it regardless of whether it could also match a subsequent element.

    You need to more carefully define the desired behavior. Commonly wildcard elements appear at the end of a sequence. Alternatively elements can be excluded from match by using a namespace constraint on the wildcard.

  • Omar

    Omar - 2013-07-29

    Unfortunately, I am not in position to change the original .xsd. If one were to work around this behaviour, what would be the correct approach?

    Would something like the following be correct?

    for element in list(my_object.wildcardElements()):
        if isinstance(element, pyxb.binding.datatypes.string):
            if element._element().name().localName() == 'Name':
                my_object.Name = element.title()

    (I am iterating over wildcardElements() and removing the ones I "know" are "wrongly" consumed.)
    Should I be removing the said elements from some other lists (orderedContent() for example)?

    Thanks for answering.

  • Peter A. Bigot

    Peter A. Bigot - 2013-07-29

    I can't say whether your workaround would be "correct", since you're starting from schema with erroneous content models. A fundamental goal of PyXB was to support validation, and working around errors in XML schema is out of scope.

    If your proposal works for you, then you should use it. I wouldn't expect you to need to change anything else including the ordered content, but I haven't really thought about what else might be affected. You'll just have to try it and see.

    One other thing you might also try is disabling validation when parsing; search the documentation and earlier questions here for how to do that. It's possible PyXB would stuff the value in the element instead of treating it as a wildcard in that case, but it's fairly likely there would be other side effects. Again, this is something where you're on your own. Sorry.

  • Omar

    Omar - 2013-07-31

    Removing elements from wildcardElements() and orderedContent() seems to work.

    Thank you for answering.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks