#184 Extension Element Order Incorrect

PyXB 1.2.2
closed
None
fixed
Content model
major
PyXB 1.2.1
defect
2013-01-16
2013-01-02
No

The ordered symbol set is in the wrong order for the attached test. The schema below should produce an instance._symbolSet() of
["a","b","c"]. Instead it produces ["c","a","b"]. It can load the XML fragment below, but when it attempts to print it, it raises:

File "content.py", line 631, in sequencedChildren

raise pyxb.UnprocessedElementContentError(self.instance, cfg, symbols, symbol_set)

From what I can tell (content.py:~570) first locates "c" in the DOM and then concludes that "a" and "b" are out of place.

--- Schema excerpt ---
<xs:element name="Test" type="Outer"/>
<xs:complexType name="Outer">

<xs:complexContent>

<xs:extension base="Inner">

<xs:sequence>

<xs:element name="c" type="xs:string" minOccurs="0" maxOccurs="1"/>

</xs:sequence>

</xs:extension>

</xs:complexContent>

</xs:complexType>
<xs:complexType name="Inner">

<xs:sequence>

<xs:element name="a" type="xs:string" minOccurs="0"/>
<xs:element name="b" type="xs:string" minOccurs="0"/>

</xs:sequence>

</xs:complexType>


<?xml version="1.0" encoding="UTF-8"?>
<Test xmlns="http://foo.org/test">

<a>A</a>
<b>B</b>
<c>C</c>

</Test>

1 Attachments

Discussion

  • Harold Solbrig

    Harold Solbrig - 2013-01-02

    Example of issue

     
  • Harold Solbrig

    Harold Solbrig - 2013-01-02

    Did a bit more peeking and the original dx was incorrect. It appears that the issue can be fixed by removing the "break" statement at line 615 in content.py:

    selected_xit = xit
    if 0 == len(matches):

    del symbol_set[ed]

    ---> # break

    if selected_xit is None:

    An alternative might be to add a "continue" immediately after the del statement - not entirely clear when len(matches) ends up > 0, but either solution solves the issue on this side.

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-01-02
    • status changed from new to accepted

    Capturing the exception and printing its details() produces:

    The containing element {http://foo.org/test}Test is defined at <unknown>[3:4].
    The containing element type {http://foo.org/test}Outer is defined at <unknown>[4:4]
    The {http://foo.org/test}Outer automaton is in an accepting state.
    The last accepted content was {http://foo.org/test}c
    No elements or wildcards would be accepted at this point.
    The following content was not processed by the automaton:
            {http://foo.org/test}a (1 instances)
            {http://foo.org/test}b (1 instances)
    

    While it may appear to be "fixed" by eliminating the break, that break is positioned correctly to prevent state explosion in non-deterministic automata like this.

    In the past, element generation order was affected by the iteration order of Python set objects, which is determined by the address of the members, which is effectively random). PyXB 1.2 normalized the order of elements to match the order in the schema (#181) so there's consistency generating content from "all" model groups and when the same instance is converted to DOM multiple times.

    In this case, the schema order is the wrong choice because the extension comes before what was extended. Infrastructure will have to be added so that the order is influenced by extension hierarchy as well as schema position. (Note that the bug does not occur when Inner is moved before Outer in the schema, so the automaton itself is arguably correct, just not manner in which its non-determinism is resolved.)

     
  • Harold Solbrig

    Harold Solbrig - 2013-01-02

    Yes - further investigation into other regression test failures showed that the issue is in the generation of the state machine, not the interpretation.

    We've encountered several situations, however, where we extend a set of optional elements imported from a schema entitled "Core.xsd" and follow it by a mandatory element in a schema called "CodeSystemVersion.xsd". The name causes the mandatory element sorts to the front of the list in the state machine, which, in turn causes a transition to the next state without processing the optional elements. Would be happy to submit a sample of the issue, although it will be a tad more difficult to put into the unit test format you are using.

    For the time being, we can pass most of the unit tests by disabling the sort that was introduced in #181. Perhaps the first element of the sort key should be the min cardinality?

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-01-02

    I don't think the sort should be based on anything other than the order in the content model; cardinality is not relevant for priority since it's checked against available symbols. When no extension types are used, using declaration order is right for sequence model groups (where it's required to satisfy the content model) and for choice and all model groups (where it eliminates non-determinism in generation per #181).

    pyxb.binding.content.ElementUse should be in one-to-one correspondence with element declarations in the schema and hence with complex types, so it can be extended with an ordinal to be used when emitting transitions instead of inferring it from the line/column/schema "address" of the use.

    I believe the fix will involve detecting when an extension type is used and assigning the ordinals for its element uses in declaration order but starting from the highest ordinal in the base type, so they are truly an extension of rather than interleaved with the base type elements. This should fix the original problem and, as I understand it, the situation you describe in the previous comment. I'll extend the test case to cover that situation (and thanks for providing it).

    Schedule-wise I'm looking at dealing with this next week; hope that's OK. Context switches between PyXB and BSP430 are pretty heavy, so I try to avoid going back and forth too much.

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-01-16
    • status changed from accepted to closed
    • resolution set to fixed

    Closed with these commits. Note that this fix may change the format of pickled schema components in archive files; regenerate all bundles and other archives to ensure the new format is used.

    commit ddf2902b29860f3b8fe9b1a441fc216ceb64af80
    Author: Peter A. Bigot <pab@pabigot.com>
    Date:   Tue Jan 15 18:21:31 2013 -0600
    
        trac/184: extension element order incorrect
    
        When generating the code for an automaton, walk the original content model
        term tree to assign ordinals to each node so that earlier candidates are
        considered before later ones.  Retain the schema-based order for sorting
        where the content model restrictions are not relevant.
    
    commit 8e7d156e4c4443c4b19d348a061be7ad71514038
    Author: Peter A. Bigot <pab@pabigot.com>
    Date:   Tue Jan 15 17:12:14 2013 -0600
    
        rename bindingSortKey to schemaOrderSortKey
    
        Sorting by order within schema is useful to maintain consistency when
        generating module code, but is incorrect for bindings.
    
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks