Menu

Class names with Unicode

Help
2016-11-04
2016-11-05
  • Sergey Bushmanov

    This is perhaps a newbie question, but I would really appreciate it if somebody points me to the right direction,

    I'm unable to generate binding classes with PyXB when element names are non ASCII?

    The minimal reproducible example:

    <?xml version="1.0" encoding="utf8"?>
    <xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="Address">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="Country" type="xs:string" />
            <xs:element name="Street" type="xs:string" />
            <xs:element name="Town" type="xs:string" />       
            <xs:element name="Дом" type="xs:string" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:schema>
    

    (look for the <xs:element name="Дом" type="xs:string" /> where I use cyrillic. The encoding of the file is utf8. However, when I try:

    pyxbgen -u example.xsd -m example

    I got the error:

    Traceback (most recent call last):
      File "/home/sergey/anaconda3/lib/python3.5/xml/sax/expatreader.py", line 210, in feed
        self._parser.Parse(data, isFinal)
    xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 26
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/sergey/anaconda3/bin/pyxbgen", line 52, in <module>
        generator.resolveExternalSchema()
    .......
    

    What is the right way to approach element names in unicode?
    I'm using Python 3.5.

    (I have put a similar question at http://stackoverflow.com/questions/40428247/pyxb-generating-classes-with-unicode but unfortunately I am not getting any response there).

    Thanks in advance!

    Update

    When I change encoding to cp1251 I got:

    WARNING:pyxb.binding.generate:Element use None.Дом renamed to emptyString
    Python for AbsentNamespace0 requires 1 modules
    

    and I am able to generate scheme like:

    '<?xml version="1.0" ?><Address><Country>US</Country><Town>City</Town><Street>Main Street</Street><Дом>13</Дом></Address>'
    

    by assigning to emptyString.
    Is there a way to keep original name Дом ?

     

    Last edit: Sergey Bushmanov 2016-11-05
  • Peter A. Bigot

    Peter A. Bigot - 2016-11-04

    As noted in my response on stackoverflow (which I don't monitor regularly), utf8 is spelled utf-8, and at this time there is no fix to support non-ASCII identifiers. It's tracked as issue 67 and will probably be fixed for PyXB 1.2.6 but that's likely to be sometime next year.

     
  • Sergey Bushmanov

    Unidecode seem to work if I further simplify pyxbgen_jp :

    from unidecode import unidecode
    def ConvertRuIdentifier (identifier):
        identifier = six.text_type(identifier)
        identifier = unidecode(identifier)
        return identifier
    pyxb.utils.utility._SetXMLIdentifierToPython(ConvertRuIdentifier)
    

    Then, I am able to:

    !pyxbgen_ru -u example.xsd -m example
    Element use None.Дом renamed to Dom
    Python for AbsentNamespace0 requires 1 modules
    
    import example as e
    address = e.Address()
    address.Country = "US"
    address.Dom = "13"
    
    from lxml import etree
    print(etree
          .tostring(etree.fromstring(address.toxml(encoding="utf-8"))
          ,encoding="utf-8",pretty_print=True)
          .decode("utf-8"))
    

    with output:

    <Address>
      <Country>US</Country>
      <Дом>13</Дом>
    </Address>
    
     

    Last edit: Sergey Bushmanov 2016-11-05
  • Peter A. Bigot

    Peter A. Bigot - 2016-11-05

    Thanks; I've updated the issue to note that approach should be considered.

     
  • Sergey Bushmanov

    Thank YOU man, your package really helps!

     

Log in to post a comment.