Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Unmarshall without validation

Help
Marty
2013-02-03
2013-03-06
  • Marty
    Marty
    2013-02-03

    I am trying to unmarshal a XML file into Python using PyXB but _NOT _using validation. I want to do something similar to JAXB so that any extra elements are ignored (you loose them), but the basic elements are still available. I see the -no-validate-changes options, but the documentation says that I am "advised to pretend these options don't exist."

    The basic problem I have been given is a system that supports forward compatibility of XML schema by only adding optional elements. When version 2 is released and has extra elements, a version 1 user can receive a version 2 schema, unmarshall without validation (extra optional elements are lost), and operate with all the functionality that was available in version 1 schema.

    With PyXB, validation always seem to be on, so the version 1 user gets a bat content error and can't do anything.

     
  • David Meyer
    David Meyer
    2013-02-05

    I am having a similar problem, though when I try to set the boolean to FALSE I still get an error. The error reads:

    StructuralBadDocumentError: Validation is required when no element_decl can be found

    I am relatively new to XML and schemas, so I am not sure if this is a problem with the source files or something else.

    Whenever I try to catch the exception raised in python and get information, namely with a call to e.details() I get an error in the exceptions_.py file that there is an attribute error and that the 'Element' object has no attribute '_diagnosticName'. Am I doing something wrong here too?

     
  • Peter A. Bigot
    Peter A. Bigot
    2013-02-05

    Yeah, that may be a problem.  First, that particular exception is raised when validation is disabled and there is neither a recognized element nor support for wildcard elements.  If it could find an element, or allowed wildcards, it would accept it even if the content model didn't allow it, but if neither can be found there's nowhere to put the corresponding material, and PyXB does not want to throw it away.

    PyXB was never intended to operate on invalid documents, and this problem does suggest that making it handle your use case is going to be more difficult.  You could replace the raise of StructuralBadDocumentError with a simple "return self" to make it permanently drop the unrecognized data.  I don't think making PyXB do that normally is within the scope of the project, but if you create a tracker ticket for that at https://sourceforge.net/apps/trac/pyxb/ I'll consider it next time I'm working on PyXB.

    That you can't call details() is arguably a bug in PyXB: the instance associated with the exception isn't of the expected type.

     
  • Marty
    Marty
    2013-02-05

    I'm seeing the same problem that engineerdtm is.

     
  • Peter A. Bigot
    Peter A. Bigot
    2013-02-05

    At this time I'm sticking with my position that operating on invalid documents is outside the intended scope of PyXB.  If you can change the schema to support wildcard elements, your use case should be supported without having to turn off validation.

     
  • David Meyer
    David Meyer
    2013-02-05

    Your suggestion worked great! I am trying to work through the code to figure out how to report to the user what element cause the problem though. If you have a suggestion that would be great, otherwise I will just hammer my way through.

    Thanks again!

     
  • Peter A. Bigot
    Peter A. Bigot
    2013-02-05

    The exception that was raised should have been given value as a parameter, that being the Python object that could not be reconciled with the content model.  If you're dropping that value without raising an exception, I don't know how you would feed it back to the system.  Maybe add a list to the element that contains all dropped values (which is what wildcard does when the schema supports it).

     
  • David Meyer
    David Meyer
    2013-02-07

    Thanks again for your help. I decided to just throw a message to the user that there were problems with the document and that xxx element was being ignored. This lets the user (me) know that something went wrong and a little about what it was. If I really feel like it I can dive in and find the element that went bad.