From: Tim L. <guy...@gm...> - 2013-01-30 19:45:20
|
Thanks Benny and John for your help and advice. As ever, it turned out to be more complicated than I thought, with the need to support: - Python2 and 3, - tar and ordinary output and - various encodings. I have committed revision 21757 http://sourceforge.net/p/gramps/code/21258/ which I hope fixes all the above variations. It turns out (at the point of a print statement I inserted in libhtml.write()) that Python2 writes mostly 8-bit strings (type=str), but also unicode (type=unicode) sometimes, while Python3 always writes text strings (type=str) (unicode). (I am not sure where libhtml does the conversion) So it is not always bytes internally. Also, although you said: "There are a couple of exceptions: xml, html, and xhtml should always be utf-8", I think that html can have a variety of encodings [1], though it may be true that it _should_ be UTF-8. Anyway, NarWeb is setup to support different output encodings for the HTML files, so I have maintained the support for this. Unfortunately, because of the different data types in python2 and 3, and the different ways conversion is done, I have had to provide different code for Python 2 and 3. [1] http://www.w3.org/International/questions/qa-html-encoding-declarations#quicklookup -- View this message in context: http://gramps.1791082.n4.nabble.com/open-versus-io-open-tp4658183p4658405.html Sent from the GRAMPS - Dev mailing list archive at Nabble.com. |