The grammar rules for
Nmtoken in the XML spec, and
NCName in the XML Namespaces spec, are defined in terms of
Letter, which is defined as a union of quite a bunch of character ranges from all over the Unicode repertoire.
PyXB, on the other hand, validates these types against substantially simpler regexes:
$ grep -n 'A-Za-z' datatypes.py 920: _ValidRE = re.compile('^[-_.:A-Za-z0-9]*$') 932: _ValidRE = re.compile('^[A-Za-z_:][-_.:A-Za-z0-9]*$') 940: _ValidRE = re.compile('^[A-Za-z_][-_.A-Za-z0-9]*$')
This causes PyXB-generated bindings to reject technically well-formed and valid documents that contain IDs in languages other than English.