Menu

Is PyXB's regex XML regex' compliant?

Help
2016-12-12
2016-12-13
  • Sergey Bushmanov

    I am interested in PyXB's behavior matching 'something|empty string'.

    I put together a simple example like:

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="Address">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="City" type="xs:string" />
            <xs:element name="Country" minOccurs="0">
              <xs:simpleType>
                <xs:restriction base="xs:string">
                  <xs:pattern value="[A-Z]{2}|" />
                </xs:restriction>
              </xs:simpleType>
            </xs:element>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:schema>
    

    with the expectation of element Country to accept either empty string or 2 letter code
    (my expectations come from XML regex defintion from https://www.w3.org/TR/xmlschema-2/#regexs

    However, when I do:

    !pyxbgen -u example_pattern.xsd -m example
    
    import example as e
    
    address = e.Address()
    
    address.City = "New-York"
    address.Country = "USA"
    

    I'm getting 3-letter Country code accepted and the following xml file produced:

    <Address>
      <City>New-York</City>
      <Country>USA</Country>
    </Address>
    

    So, my question is: is PyXB's regex engine XML compliant?

    A little bit of background:

    1. I got patterns like "[A-Z]{2}|" from xmlSpy generated XSD's.
    2. The above mentioned pattern does not validate either through xmllint or lxml (both throw an error that this is not a valid regex pattern)
    3. I've asked a related question on stackoverflow and got some replies there.
     

    Last edit: Sergey Bushmanov 2016-12-12
  • Sergey Bushmanov

    Well, the answer to my problem is brackets:

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="Address">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="City" type="xs:string" />
            <xs:element name="Country" minOccurs="0">
              <xs:simpleType>
                <xs:restriction base="xs:string">
                  <xs:pattern value="([A-Z]{2}|)" />
                </xs:restriction>
              </xs:simpleType>
            </xs:element>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:schema>
    

    After putting the pattern in brackets it works as expected.

     

    Last edit: Sergey Bushmanov 2016-12-12
  • Peter A. Bigot

    Peter A. Bigot - 2016-12-12

    PyXB attempts to translate XML regex syntax into Python syntax for execution. It's intended to be correct; if it isn't please open a ticket on github.

     
  • Sergey Bushmanov

    I will. As a temporal remedy, can you please point me to the file where this translation happens?

     

Log in to post a comment.