#33 special case sequence with optional elements

PyXB 1.1.2
Content model
PyXB 0.5.1

The dwGML schema incorporates a content model that has a sequence containing 99 individual elements each with a minOccurs=0 and maxOccurs=1. Generating a DFA from the content model produces 99+c states, which takes a long time and a lot of unnecessary space. Consider providing an alternative model for sequence content as was done for choice.




  • Peter A. Bigot

    Peter A. Bigot - 2009-07-26
    • status changed from new to accepted
    • milestone changed from PyXB 0.5.2 to PyXB 1.0.0

    Moving this out as it's a low priority performance enhancement.

  • Peter A. Bigot

    Peter A. Bigot - 2009-08-08
    • type changed from defect to enhancement
  • Peter A. Bigot

    Peter A. Bigot - 2010-02-19
    • priority changed from minor to major
    • milestone changed from PyXB 1.0.0 to PyXB 1.1.2
  • Aoriste Boutade

    Aoriste Boutade - 2010-05-18
    • cc aoriste added

    As discussed here, I do indeed think that this is the problem. This is relatively high priority for me as I have been using pyxb at work for almost two months. I've built a few tools (XML <-> PostgreSQL conversions) using pyxb, but always tested with small example xml documents (my bad). Upon my using a larger example document, the time complexity of CreateFromDocument exploded. For me, this is unacceptable because I plan on integrating this function into a command line tool for performing regular imports and exports of XML data.

    If assistance might be helpful (other than the testing and feedback I will continue to give anyway), I would be happy to offer it. If not corrected in a timely fashion, project deadline constraints would force me to to seek another option in this niche (generateDS, or some schema-ignorant method using elementtree), and so helping to correct this bug would help me to complete my for-pay project on time.

  • Peter A. Bigot

    Peter A. Bigot - 2010-05-30
    • status changed from accepted to closed
    • resolution set to fixed

    Fixed in trac/33 and merged to next. Complete replacement of the model group portion of the content model. The NFA-to-DFA approach is gone. The resulting system does a better job in less space and significantly faster: 30% on the standard tmsxtvd test, orders of magnitude on documents with large sequences of optional elements.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks