From: Paul P. <pa...@pr...> - 2002-01-11 23:18:09
|
Finn Bock wrote: > >... > > Is a no-op a normal use? Maybe when porting CPython applications? It wasn't a porting project when Brian ran into it. But maybe he was thinking in CPython terms as he was coding. Is it accurate to say that either way, the single-arg unicode() call is useless in a made-for Jython application? If so, the question is how to emulate CPython most directly. And then it is really a probabilistic issue because you can never get 100% if CPython has two types where Jython has only one. If you turn it into a no-op, you will have the effect in some cases of making Latin1 the default encoding (which is what I have proposed for CPython). Consider: If this is a no-op: >>> x=u'\x81' >>> x=unicode(x) Then so is this: >>> x='\x81' >>> x=unicode(x) I have no problem with that, myself, but it is precisely the proposal that caused the heated i18n flamewars. >... > > In jython the IMO obvious default would be the file.encoding property. > Maybe I should have picked that default when I added unicode support, > but after seeing the casualties of the that discussion on python-dev, I > didn't dare. Still, that wouldn't be a cure-all. In CPython, this would always be a no-op: unicode(unicode(unicode(unicode(u"...")))) In Jython, it would decode according to the file.encoding several times, potentially changing the string every time. Perhaps the one-arg version of the unicode function should simply be illegal (deprecated) when applied to strings. If you want to decode according to file.encoding, you could. If you want to decode according to sys.getdefaultencoding, you could. If you want to hard-code ASCII, you could. It is probably an illusion to think that people can work with Unicode without thinking about encodings anyhow. That's another reason I don't think that file.encoding and sys.defaultencoding are particularly useful. Ten years ago it made sense to guess at the file encoding based on the user's machine locale and OS. Today, I don't think it does. When push comes to shove, the end-user must specify the encoding of the data rather than expecting operating systems or interpreters to guess based on the locale. Paul Prescod |