Re: [Cheetahtemplate-discuss] Cheetah and unicode
Brought to you by:
rtyler,
tavis_rudd
From: Tavis R. <ta...@re...> - 2005-08-09 19:34:50
|
Hi deelan,=20 thanks for the detailed report. It seems the default EncodeUnicode filter = is=20 returning the value for your 'variable' encoded with UTF8. I think the=20 default filter should just return the raw unicode string instead.=20 It's easily fixable, but there may be backwards compatibility issues. In t= he=20 short-term, I'm going to add this filter (as soon as sourceforge get the cv= s=20 server working again!): class RawOrEncodedUnicode(Filter): def filter(self, val, encoding=3DNone, str=3Dstr, type=3Dtype, unicodeType=3Dtype(u''), **kw): """Pass Unicode strings through unmolested, unless an encoding is specifie= d. =20 """ if type(val)=3D=3DunicodeType: if encoding: filtered =3D val.encode(encoding) else: filtered =3D val elif val is None: filtered =3D '' else: filtered =3D str(val) return filtered Should we make this the default filter? =20 Tavis On Tuesday 09 August 2005 05:05, Andrea wrote: > Brian Bird wrote: > > Is there any liklihood that a future version of Cheetah will be able to > > use unicode strings directly, rather than encoding them itself. > > > > Eg. > > from Cheetah import Template > > source=3Du"This is a unicode template with a $variable" > > namespace=3D{u"variable":u"unicode \xa3string"} > > t=3DTemplate.Template(source=3Dsource, searchList=3D[namespace]) > > output=3Dstr(t) # perhaps this should be: output=3Dunicode(t) > > print output > > print type(output) > > > > This fails because Cheetah expects the "unicode \xa3string" to be able > > to be encoded into ascii. Instead I would like the variable "output" to > > be of type unicode instead of type str. At the moment I've encoded > > everything in utf-8 beforehand (or in a Filter) and then decoded it > > again at the end - but this seems very inefficient. > > Maybe I'm a bit late, but i noticed your post only now. I'm trying to > use unicode internally in Subway[1] (which uses Cheetah as templating > engine) and surely your problem interest me. > > I've a slighly modified version of your code > > from Cheetah import Template > source=3Du"This is a unicode template with a $variable" > namespace=3D{u"variable":u"unicode \xa3string"} > t=3DTemplate.Template(source=3Dsource, searchList=3D[namespace]) > output=3Dt.respond() # <- just to be sure which method is called > print output > print type(output) > > and got this error: > >python -u "cheetah-test.py" > > Traceback (most recent call last): > File "cheetah-test.py", line 8, in ? > output=3Dt.respond() > File "<string>", line 32, in respond > File "C:\python\lib\StringIO.py", line 271, in getvalue > self.buf +=3D ''.join(self.buflist) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 8: > ordinal not in range(128) > > >Exit code: 1 > > As you can see it fails when it comes to produce an actual output, > in particular is uses Python's StringIO module to makeup a str out > a buffer list. > > inspectin' other compiled Cheetah templates it seems that > respond() does a lots of writeS calls, using StrinIO as buffer. > after that comes this code: > > return trans.response().getvalue() > > that getvalue() is the StringIO method which: > > "return whole file's contents as a string" > (http://pydoc.org/2.4.1/StringIO.html) > > and: > > ''The StringIO object can accept either Unicode or 8-bit strings, > but mixing the two may take some care. If both are used, 8-bit > strings that cannot be interpreted as 7-bit ASCII (that use the > 8th bit) will cause a UnicodeError to be raised when getvalue() > is called.'' > > so, it seems that Cheetah is mixing and matching str and unicode > objects, triggering that UnicodeError. > > infact, using StrinIO alone works: > > u1 =3D u"unicode \u00a3string" # pound sign [2] > u2 =3D u"another unicode \u00a3string" # pound sign > from StringIO import StringIO as sio > > b =3D sio() > b.write(u1) > b.write(u2) > print type(b.getvalue()), b.getvalue().encode('latin-1') > > >python -u "cheetah-test.py" > > <type 'unicode'> unicode =A3stringanother unicode =A3string > > So, my question to the Cheetah developers is: how could we fix this? > ideally if Cheetah receives an unicode source template and placeholder > are unicode objects too it should return (respond()) an unicode object. > > Is it fixable? > > thanks in advance, > deelan > > [1] http://subway.python-hosting.com/ > [2] http://www.fileformat.info/info/unicode/char/00a3/index.htm > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practic= es > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Cheetahtemplate-discuss mailing list > Che...@li... > https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss |