Hello all,
I'm adding unicode support to ConfigObj.
This is *basically* complete - just a couple of troublesome uses of
``str()`` in validate to nail.
The current situation, is that if you supply a UTF8 or UTF16 file with a
BOM, then ConfigObj will autodetect the encoding and decode appropriately.
For UTF16 this is essential, ConfigObj will mangle a byte string with a
16bit encoding unless it decodes first.
For UTF8 this is possibly not what you want, you might want to keep UTF8
byte-strings. (Some applications attempt to avoid unicode issues by
retaining UTF8 encoded byte strings throughout.) You can get round this
by parsing the config file, detecting that the encoding attribute has
been set, and then calling ``encode``.
from configobj import ConfigObj
cfg = ConfigObj(cfg_file)
if cfg.encoding == 'utf_8':
cfg.encode('utf_8')
cfg.encoding = None
Is this adequate, or should the default behaviour for ConfigObj remain
*ignoring* the encoding (in the case of UTF8) except to preserve the BOM ?
All the best,
Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
|