Unmarshall without validation

Help
Marty
2013-02-03
2013-03-06
  • Marty

    Marty - 2013-02-03

    I am trying to unmarshal a XML file into Python using PyXB but _NOT _using validation. I want to do something similar to JAXB so that any extra elements are ignored (you loose them), but the basic elements are still available. I see the -no-validate-changes options, but the documentation says that I am "advised to pretend these options don't exist."

    The basic problem I have been given is a system that supports forward compatibility of XML schema by only adding optional elements. When version 2 is released and has extra elements, a version 1 user can receive a version 2 schema, unmarshall without validation (extra optional elements are lost), and operate with all the functionality that was available in version 1 schema.

    With PyXB, validation always seem to be on, so the version 1 user gets a bat content error and can't do anything.

     
  • David Meyer

    David Meyer - 2013-02-05

    I am having a similar problem, though when I try to set the boolean to FALSE I still get an error. The error reads:

    StructuralBadDocumentError: Validation is required when no element_decl can be found

    I am relatively new to XML and schemas, so I am not sure if this is a problem with the source files or something else.

    Whenever I try to catch the exception raised in python and get information, namely with a call to e.details() I get an error in the exceptions_.py file that there is an attribute error and that the 'Element' object has no attribute '_diagnosticName'. Am I doing something wrong here too?

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-02-05

    Yeah, that may be a problem.  First, that particular exception is raised when validation is disabled and there is neither a recognized element nor support for wildcard elements.  If it could find an element, or allowed wildcards, it would accept it even if the content model didn't allow it, but if neither can be found there's nowhere to put the corresponding material, and PyXB does not want to throw it away.

    PyXB was never intended to operate on invalid documents, and this problem does suggest that making it handle your use case is going to be more difficult.  You could replace the raise of StructuralBadDocumentError with a simple "return self" to make it permanently drop the unrecognized data.  I don't think making PyXB do that normally is within the scope of the project, but if you create a tracker ticket for that at https://sourceforge.net/apps/trac/pyxb/ I'll consider it next time I'm working on PyXB.

    That you can't call details() is arguably a bug in PyXB: the instance associated with the exception isn't of the expected type.

     
  • Marty

    Marty - 2013-02-05

    I'm seeing the same problem that engineerdtm is.

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-02-05

    At this time I'm sticking with my position that operating on invalid documents is outside the intended scope of PyXB.  If you can change the schema to support wildcard elements, your use case should be supported without having to turn off validation.

     
  • David Meyer

    David Meyer - 2013-02-05

    Your suggestion worked great! I am trying to work through the code to figure out how to report to the user what element cause the problem though. If you have a suggestion that would be great, otherwise I will just hammer my way through.

    Thanks again!

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-02-05

    The exception that was raised should have been given value as a parameter, that being the Python object that could not be reconciled with the content model.  If you're dropping that value without raising an exception, I don't know how you would feed it back to the system.  Maybe add a list to the element that contains all dropped values (which is what wildcard does when the schema supports it).

     
  • David Meyer

    David Meyer - 2013-02-07

    Thanks again for your help. I decided to just throw a message to the user that there were problems with the document and that xxx element was being ignored. This lets the user (me) know that something went wrong and a little about what it was. If I really feel like it I can dive in and find the element that went bad.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks