Menu

Instances with absent namespaces can only be reconstructed in-session

Help
Roger Dahl
2017-03-01
2017-03-01
  • Roger Dahl

    Roger Dahl - 2017-03-01

    Thank you for PyXB.

    I'm having trouble getting PyXB working with multiprocessing (using the standard multiprocessing module). When I attempt to pass an object that has used PyXB from a parallel worker to the shared queue, I get the message

    WARNING:pyxb.namespace:Instances with absent namespaces can only be reconstructed in-session

    and the object does not get added to the queue.

    The object being passed should not, as far as I can tell after examining it in the debugger, be holding on to any PyXB references. It has just used PyXB earlier in the process to deserialize an XML doc and extract some values, which are stored as str and int members of the object.

    So I'm trying to find out why the multiprocessing module is ending up trying to pickle / unpickle a PyXB object and also why that fails with the message about absent namespaces?

    The XML doc that was handled with PyXB earlier is a very simple doc that indeed does not have any namespaces.

    Any help much appreciated!

     
  • Peter A. Bigot

    Peter A. Bigot - 2017-03-01

    I would expect the multiprocessing module would have to pickle+unpickle the object to transfer it to another process.

    See this section. In short, schema that have no target namespace still create a Namespace instance with a unique id; that namespace is local to a PyXB session. All QNames in the object (some of which are class properties) will reference the namespace. When objects are transformed by the copy module they get serialized using that identifier. They can then only be reconstructed in a session that can convert that identifier to the specific absent namespace, which is necessary to do validation using the information content of that namespace.

    The motivation and patches that created this behavior are documented here.

    A workaround might be to modify the schema so each absent namespace is converted to an explicit default namespace.

     
  • Roger Dahl

    Roger Dahl - 2017-04-15

    Thank you for your fast reply, Peter, and my appologies for not replying in any timely manner.

    I've used PyXB for many years now and for the most part, PyXB just does its magic flawlessly, so I haven't had reason to dive in and understand the implementation. I've read the doc and rationale you referenced but can't say I understand how the implementation works. Is the complexity required or would you have implemented PyXB differently if you knew what you now know?

    The multiprocessing module does use pickling to transfer objects across process boundaries and I did expect my object to get pickled. Rather, the issue is that an attempt is also made to pickle and unpickle a PyXB object even tough my object does not hold any references to a PyXB object. PyXB was used earlier in the process to extract some values that are stored in my object, but the values are stored as plain integers and strings.

    So, as far as I can tell, using PyXB established a reference from my object to a PyXB object. So, during pickling that reference is found an causes an attempt to pickle the PyXB object. Even though, when I view the object in the debugger, I don't see a reference. If this behavior is unexpected to you, I'll try to create a minimal example. Let me know.

    The workaround for me was to extract the values from my object and store them in a dict, pass the dict through the process boundary, and then use the dict to create a new instance of my object in the other process.

    I have encountered another issue now that may be related. It relates to the same simple XML doc (that does not have a namespace) from which I was extracting values to store in the object that got pickled. When I run unit tests that deserialize a binding to create the simple XML doc, I get the correct XML doc. But, if I run the unit tests in a batch after other unit tests that operate on XML docs that do have namespaces, the simple XML doc gets serialized with a namespace from one of the other XML docs.

    So, it seems like PyXB gathers information about the other XML docs and references it in a global state, then ends up erroneously adding a namespace to a binding that should not have one. I can try creating a minimal example for this as well, if you like.

    Thanks again for all your help.

     
  • Peter A. Bigot

    Peter A. Bigot - 2017-04-16

    Some of the complexity is required given PyXB's primary goal of supporting XML validation, both in constructing the bound objects and generating XML documents from them.

    I suspect the issue is that your objects do in fact reference PyXB objects. Even integer and string values are actually instances of subclasses of the Python types, so that they can identify the XML type they came from and use that to affect generation or further validation. This is done by modifying the class hierarchy to inject PyXB mixins. I can't say why or whether any debugger would or would not display the mixins.

    Managing absent namespaces is tricky, and it's possible that a reconstructed object might end up referencing the wrong one, especially if PyXB's annotations are stripped. If you'd like to provide a reproducing case and add it as an issue on github I'll look at it next time I'm working PyXB.

    I should note, though, that PyXB development started over eight years ago, and features such as inter-process exchange were never envisioned in its design. Except for roughly semi-annual maintenance of PyXB I haven't used Python for about five years. So though I will make a reasonable attempt to fix problems that fall within the scope of the existing implementation, it's very unlikely that the fundamental architecture will change.

     
  • Roger Dahl

    Roger Dahl - 2017-04-17

    Thank you for the info.

    I don't think the debugger would see the injected PyXB mixins, so that seems very like to be what's happening. I'll do a test to verify.

    I have also encountered an issue now where a bound object that had a namespace was serialized with a different namespace depending on the order in which the tests ran. In both cases, I seem to have gotten things working by adding pyxb.utils.domutils.BindingDOMSupport.SetDefaultNamespace(None) before the call to toxml() (at least for the ordering of tests I happen to get).

    I'll try to put some minimal repros together and put them on GitHub.

    Did you find a language you like better than Python?

     
  • Roger Dahl

    Roger Dahl - 2017-06-06

    Just wanted to let you know that PyXB is currently working perfectly for us.

    The issues I was having with specific test run orderings giving unexpected generated XML types was due to some experimental SetDefaultNamespace() calls I had accidentally left in the code. After removing them, I now get the expected types regardless of test execution order. I've also verified that the initial issue I was having is due to the injected PyXB mixins.

    Thanks again!

     

Log in to post a comment.