When the file is large, say 7MB, I notice the significant performance issue. I have located the problem to CreateFromDocument() function which is generated by pyxb. It is used to "Parse the given XML and use the document element to create a Python instance".
More specifically, it is the following line in the above method which takes majority of the time to execute:
saxer.parse(io.BytesIO(xmld))
where xmld is the xml string that is passed into this function.
However, when I change the type from "xsd:base64Binary" to "xsd:string", read the file into a string and transmit it over the wire, the performance issue simply disappears.
Any ideas and hints why this may happen? Any help is greatly appreciated.
Cheers,
James
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
PyXB uses the Python standard base64 using the standard_b64decode and standard_b64encode routines. Looking at the PyXB code it appears that module allows representations that are not permitted by the XML schema rule. As PyXB is a validating processor it must check whether the incoming encoded data is a valid XML representation. It does this with a complex regular expression.
In pyxb/binding/datatypes.py in class base64Binary you'll see a code block:
# This is what it costs to try to be a validating processor.
if cls.__Lexical_re.match(xmlt) is None:
raise SimpleTypeValueError(cls, xmlt)
Try commenting out that check.
If you'd like a workaround for this to be part of future releases please file an issue on github.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have been experiencing performance issues when trying uploading large files.
For example, say we have the following schema:
<xsd:complexType name="uploadFileRequest">
<xsd:sequence>
<xsd:element name="file" type="xsd:base64Binary" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
When the file is large, say 7MB, I notice the significant performance issue. I have located the problem to CreateFromDocument() function which is generated by pyxb. It is used to "Parse the given XML and use the document element to create a Python instance".
More specifically, it is the following line in the above method which takes majority of the time to execute:
saxer.parse(io.BytesIO(xmld))
where xmld is the xml string that is passed into this function.
However, when I change the type from "xsd:base64Binary" to "xsd:string", read the file into a string and transmit it over the wire, the performance issue simply disappears.
Any ideas and hints why this may happen? Any help is greatly appreciated.
Cheers,
James
PyXB uses the Python standard base64 using the
standard_b64decode
andstandard_b64encode routines
. Looking at the PyXB code it appears that module allows representations that are not permitted by the XML schema rule. As PyXB is a validating processor it must check whether the incoming encoded data is a valid XML representation. It does this with a complex regular expression.In
pyxb/binding/datatypes.py
in classbase64Binary
you'll see a code block:Try commenting out that check.
If you'd like a workaround for this to be part of future releases please file an issue on github.
Thanks Peter! It works perfectly for me, and I have filed an issue on github as well.
Last edit: James 2016-02-10