Thread: [Configobj-develop] Unicode Support in ConfigObj

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello all,

I'm adding unicode support to ConfigObj.

This is *basically* complete - just a couple of troublesome uses of  
``str()`` in validate to nail.

The current situation, is that if you supply a UTF8 or UTF16 file with a 
BOM, then ConfigObj will autodetect the encoding and decode appropriately.

For UTF16 this is essential, ConfigObj will mangle a byte string with a 
16bit encoding unless it decodes first.

For UTF8 this is possibly not what you want, you might want to keep UTF8 
byte-strings. (Some applications attempt to avoid unicode issues by 
retaining UTF8 encoded byte strings throughout.) You can get round this 
by parsing the config file, detecting that the encoding attribute has 
been set, and then calling ``encode``.

from configobj import ConfigObj
cfg = ConfigObj(cfg_file)
if cfg.encoding == 'utf_8':
    cfg.encode('utf_8')
    cfg.encoding = None

Is this adequate, or should the default behaviour for ConfigObj remain 
*ignoring* the encoding (in the case of UTF8) except to preserve the BOM ?

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Thread: [Configobj-develop] Unicode Support in ConfigObj

configobj-develop