Menu

Performance issue when parsing large base64Binary

Help
James
2016-02-10
2016-02-10
  • James

    James - 2016-02-10

    Hello,

    I have been experiencing performance issues when trying uploading large files.

    For example, say we have the following schema:

    <xsd:complexType name="uploadFileRequest">
    <xsd:sequence>
    <xsd:element name="file" type="xsd:base64Binary" minOccurs="1" maxOccurs="1"/>
    </xsd:sequence>
    </xsd:complexType>

    When the file is large, say 7MB, I notice the significant performance issue. I have located the problem to CreateFromDocument() function which is generated by pyxb. It is used to "Parse the given XML and use the document element to create a Python instance".

    More specifically, it is the following line in the above method which takes majority of the time to execute:
    saxer.parse(io.BytesIO(xmld))
    where xmld is the xml string that is passed into this function.

    However, when I change the type from "xsd:base64Binary" to "xsd:string", read the file into a string and transmit it over the wire, the performance issue simply disappears.

    Any ideas and hints why this may happen? Any help is greatly appreciated.

    Cheers,
    James

     
  • Peter A. Bigot

    Peter A. Bigot - 2016-02-10

    PyXB uses the Python standard base64 using the standard_b64decode and standard_b64encode routines. Looking at the PyXB code it appears that module allows representations that are not permitted by the XML schema rule. As PyXB is a validating processor it must check whether the incoming encoded data is a valid XML representation. It does this with a complex regular expression.

    In pyxb/binding/datatypes.py in class base64Binary you'll see a code block:

            # This is what it costs to try to be a validating processor.
            if cls.__Lexical_re.match(xmlt) is None:
                raise SimpleTypeValueError(cls, xmlt)
    

    Try commenting out that check.

    If you'd like a workaround for this to be part of future releases please file an issue on github.

     
    • James

      James - 2016-02-10

      Thanks Peter! It works perfectly for me, and I have filed an issue on github as well.

       

      Last edit: James 2016-02-10

Log in to post a comment.