Thread: [SQLObject] Big API changes are coming
SQLObject is a Python ORM.
Brought to you by:
ianbicking,
phd
From: Oleg B. <ph...@ph...> - 2006-10-23 13:23:12
|
Hello! I am going to start working on a number of rather large issues, and few of them are really big changes. I will describe them in greater details below, but now I want to ask a question: how should I handle the changes? I can release SQLObject 0.8 from the trunk and start to work on these large issues; then they will be in the SQLObject 0.9. Or I can continue to work in the trunk now, so the changes will be in SQLObject 0.8, but the release will be much later. What is better for the SQLObject users? The biggest issue I am going to work on is Unicode support. It is now clear that my original decision to allow an every UnicodeCol to has its own dbEncoding was a mistake. I would like to repair it. I am going to remove dbEncoding from UnicodeColumns and move it to DBConnection. Every connection could have dbEncoding; the encoding will be used to convert: -- strings queries to unicode and unicode results to strings for those DB API drivers that accept and return unicode: latest MySQLdb and PySQLite; -- unicode queries to string and string results to unicode columns for those DB API drivers that don't work with unicode. The second big issue is %-encoded DB URIs. I'd like to change the URI encoding to be a proper %-encoding. This allow users to use standard tools (urllib) to generate queries that contain special characters, and allows the developers to use the same standard libraries to parse the encoded URIs. But is a big change as it requires everyone to reencode their DB URIs. I am going to start obsoleting support for Python 2.2. The first step will be to issue a warning. I am not sure if SQLObject 0.8 should issues the warning, or should I wait for SQLObject 0.9? There is also a few lesser issues, not as world shuttering. I am going to work on sqlbuilder.Select() to synchronize its features with SQLObject.select(). Even lesser is to implement fromDatabase for SQLite - the last database that lacks the feature. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Jorge G. <jg...@gm...> - 2006-10-23 13:43:10
|
Oleg Broytmann <ph...@ph...> writes: > Hello! I am going to start working on a number of rather large issues, and > few of them are really big changes. I will describe them in greater details > below, but now I want to ask a question: how should I handle the changes? > I can release SQLObject 0.8 from the trunk and start to work on these > large issues; then they will be in the SQLObject 0.9. Or I can continue to > work in the trunk now, so the changes will be in SQLObject 0.8, but the > release will be much later. What is better for the SQLObject users? If I were you, I'd adopt the first approach and release 0.8 and then start with 0.9. > I am going to start obsoleting support for Python 2.2. The first step > will be to issue a warning. I am not sure if SQLObject 0.8 should issues > the warning, or should I wait for SQLObject 0.9? I'd do that as earlier as possible so that people have time to study. If you can postpone that obsolescense to 1.0 (it looks like it since you're asking if it should start with 0.9) then start now, keep it on 0.9 and deprecate Python 2.2 in 1.0... -- Jorge Godoy <jg...@gm...> |
From: Oleg B. <ph...@ph...> - 2006-10-23 14:05:18
|
On Mon, Oct 23, 2006 at 10:40:21AM -0300, Jorge Godoy wrote: > If I were you, I'd adopt the first approach and release 0.8 and then start > with 0.9. > > > I am going to start obsoleting support for Python 2.2. The first step > > will be to issue a warning. I am not sure if SQLObject 0.8 should issues > > the warning, or should I wait for SQLObject 0.9? > > I'd do that as earlier as possible so that people have time to study. If you Considering your first advice I read this "release 0.8 now, add the warning in 0.9, remove support after 0.9." > can postpone that obsolescense to 1.0 (it looks like it since you're asking if > it should start with 0.9) then start now, keep it on 0.9 and deprecate Python > 2.2 in 1.0... The release after 0.9 not necessary will be numerated as 1.0. It could be 0.10 ;) It will depend on its quality. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Jorge G. <jg...@gm...> - 2006-10-23 14:18:56
|
Oleg Broytmann <ph...@ph...> writes: > On Mon, Oct 23, 2006 at 10:40:21AM -0300, Jorge Godoy wrote: >> If I were you, I'd adopt the first approach and release 0.8 and then start >> with 0.9. >> >> > I am going to start obsoleting support for Python 2.2. The first step >> > will be to issue a warning. I am not sure if SQLObject 0.8 should issues >> > the warning, or should I wait for SQLObject 0.9? >> >> I'd do that as earlier as possible so that people have time to study. If you > > Considering your first advice I read this "release 0.8 now, add the > warning in 0.9, remove support after 0.9." I was thinking more on "add the warning in 0.8, keep the warning in 0.9 and remove support in the next release". > The release after 0.9 not necessary will be numerated as 1.0. It could > be 0.10 ;) It will depend on its quality. Indeed... :-) -- Jorge Godoy <jg...@gm...> |
From: Oleg B. <ph...@ph...> - 2006-10-23 14:58:54
|
On Mon, Oct 23, 2006 at 11:15:18AM -0300, Jorge Godoy wrote: > I was thinking more on "add the warning in 0.8, keep the warning in 0.9 and > remove support in the next release". I see. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Jorge V. <jor...@gm...> - 2006-10-23 16:01:08
|
On 10/23/06, Oleg Broytmann <ph...@ph...> wrote: > Hello! I am going to start working on a number of rather large issues, and > few of them are really big changes. I will describe them in greater details > below, but now I want to ask a question: how should I handle the changes? > I can release SQLObject 0.8 from the trunk and start to work on these > large issues; then they will be in the SQLObject 0.9. Or I can continue to > work in the trunk now, so the changes will be in SQLObject 0.8, but the > release will be much later. What is better for the SQLObject users? > I agree with Jorge, please send out 0.8 TG will benefit from a stable instead of using the 0.7bugfix > The biggest issue I am going to work on is Unicode support. It is now > clear that my original decision to allow an every UnicodeCol to has its own > dbEncoding was a mistake. I would like to repair it. I am going to remove > dbEncoding from UnicodeColumns and move it to DBConnection. Every > connection could have dbEncoding; the encoding will be used to convert: > I'll really like that specially support for LIKE and other operands > -- strings queries to unicode and unicode results to strings for those DB > API drivers that accept and return unicode: latest MySQLdb and PySQLite; > -- unicode queries to string and string results to unicode columns for > those DB API drivers that don't work with unicode. > > The second big issue is %-encoded DB URIs. I'd like to change the URI > encoding to be a proper %-encoding. This allow users to use standard tools > (urllib) to generate queries that contain special characters, and allows > the developers to use the same standard libraries to parse the encoded > URIs. But is a big change as it requires everyone to reencode their DB URIs. > > I am going to start obsoleting support for Python 2.2. The first step > will be to issue a warning. I am not sure if SQLObject 0.8 should issues > the warning, or should I wait for SQLObject 0.9? > if 0.8 is going "real soon" I think it could wait for 0.9 unless it's making other enhancements not possible > There is also a few lesser issues, not as world shuttering. I am going > to work on sqlbuilder.Select() to synchronize its features with > SQLObject.select(). that's a good idea. +1 > Even lesser is to implement fromDatabase for SQLite - > the last database that lacks the feature. > > Oleg. > -- > Oleg Broytmann http://phd.pp.ru/ ph...@ph... > Programmers don't die, they just GOSUB without RETURN. > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > sqlobject-discuss mailing list > sql...@li... > https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss > |
From: sophana <so...@zi...> - 2006-10-23 20:00:01
|
If 0.8 is stable, why not releasing it indeed. What about the issue on a caching algorithm that maintains a constant cache memory size (not having to flush the cash from time to time) This algorithm would delete cached entries after a specified number of entries is reached. (I don't know what is the difficulty about that) My website database does'nt reach a critical size yet, so i'm not affected yet, but I would find this really annoying. I would also like to thank Oleg for his GREAT job on sqlobject! Sophana |
From: Oleg B. <ph...@ph...> - 2006-10-23 20:10:23
|
On Mon, Oct 23, 2006 at 10:00:29PM +0200, sophana wrote: > If 0.8 is stable, why not releasing it indeed. I am still waiting for MySQLdb users to report how stable is it in regard with unicode. ;) > What about the issue on a caching algorithm that maintains a constant > cache memory size (not having to flush the cash from time to time) > This algorithm would delete cached entries after a specified number of > entries is reached. (I don't know what is the difficulty about that) > My website database does'nt reach a critical size yet, so i'm not > affected yet, but I would find this really annoying. There was a patch in SQLObject 0.7.1 called "cull patch" that does exactly this. If it doesn't - that should be investigated and fixed. > I would also like to thank Oleg for his GREAT job on sqlobject! Thank you, and many thanks to all patch submitters and testers! Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: sophana <so...@zi...> - 2006-10-23 20:54:17
|
Oleg Broytmann a =E9crit : > On Mon, Oct 23, 2006 at 10:00:29PM +0200, sophana wrote: > =20 >> If 0.8 is stable, why not releasing it indeed. >> =20 > > I am still waiting for MySQLdb users to report how stable is it in > regard with unicode. ;) > > =20 I already replied. My website now works just fine (in the sqlobject part). I did have to use both sqlobjectEncoding + charset parameter for it to work. (wasn't it supposed to be only one?) I'm still suffering on the python bug when you add a string to an unicode with +=3D, the string is encoded into ascii. I still don't understand why python didn't merge unicode and strings. >> What about the issue on a caching algorithm that maintains a constant >> cache memory size (not having to flush the cash from time to time) >> This algorithm would delete cached entries after a specified number of >> entries is reached. (I don't know what is the difficulty about that) >> My website database does'nt reach a critical size yet, so i'm not >> affected yet, but I would find this really annoying. >> =20 > > There was a patch in SQLObject 0.7.1 called "cull patch" that does > exactly this. If it doesn't - that should be investigated and fixed. > > =20 Unfortunately, I can't test it. But it is a good thing that it does exist= . >> I would also like to thank Oleg for his GREAT job on sqlobject! >> =20 > > Thank you, and many thanks to all patch submitters and testers! > > Oleg. > =20 |
From: Markus G. <m.g...@gm...> - 2006-10-23 21:09:07
|
On 10/23/06, sophana <so...@zi...> wrote: > I'm still suffering on the python bug when you add a string to an > unicode with +=, the string is encoded into ascii. This is NOT a Python bug. > I still don't > understand why python didn't merge unicode and strings. Because you cannot add a string to a unicode object. If you do this without specifying an encoding, it usually reveals a bug in you code. Whithout specifying an encoding, which should be used to convert the string to a unicode object, Python uses ASCII as default encoding. It is important to understand this, otherwise it is likely that a program works just "by accident" and will fail under different input. Read http://www.joelonsoftware.com/articles/Unicode.html Then google for Python and unicode. |
From: Oleg B. <ph...@ph...> - 2006-10-24 08:25:22
|
On Mon, Oct 23, 2006 at 10:54:52PM +0200, sophana wrote: > > I am still waiting for MySQLdb users to report how stable is it in > > regard with unicode. > > > I already replied. My website now works just fine (in the sqlobject > part). Thank you for the report! > I did have to use both sqlobjectEncoding + charset parameter for > it to work. (wasn't it supposed to be only one?) No, not yet. There were three that I merged into two, and the next step will be to use one encoding. BTW, the step will be harder for MySQL because MySQL uses different naming schemes. Even worse, MySQL uses different encoding names in different versions, so I will need to dig the information from MySQL docs and write a few mappings from python encoding names to MySQL names. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Ronald O. <ron...@ma...> - 2006-10-24 08:38:41
Attachments:
smime.p7s
|
On Oct 23, 2006, at 10:54 PM, sophana wrote: > > I'm still suffering on the python bug when you add a string to an > unicode with +=, the string is encoded into ascii. I still don't > understand why python didn't merge unicode and strings. Could you give an example? My understanding of what you write: Python 2.4.4 (#1, Oct 18 2006, 10:34:39) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type "help", "copyright", "credits" or "license" for more information. .>> s = u'hello' .>> s += 'world' .>> s u'helloworld' .>> If you add a str() value to a unicode() using += this behaves just like I expect. As Markus already noted relying on this behaviour is bad outside of tightly contrained boundaries (such as when the str() value is a constant that you know to be ASCII). In general it is much better to be explicit about conversions between str and unicode, otherwise you'll one day run into input where the default conversion raises an exception. Ronald |
From: Jorge G. <jg...@gm...> - 2006-10-24 10:18:50
|
Ronald Oussoren <ron...@ma...> writes: > On Oct 23, 2006, at 10:54 PM, sophana wrote: > >> >> I'm still suffering on the python bug when you add a string to an >> unicode with +=3D, the string is encoded into ascii. I still don't >> understand why python didn't merge unicode and strings. > > Could you give an example? My understanding of what you write: > > Python 2.4.4 (#1, Oct 18 2006, 10:34:39) > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > .>> s =3D u'hello' > .>> s +=3D 'world' > .>> s > u'helloworld' > .>> Change the order. >>> s =3D u'ol=C3=A1' >>> s +=3D 'mundo' >>> s u'ol\xe1mundo' >>> s =3D u'leite com ' >>> s +=3D 'caf=C3=A9' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ord= inal not in range(128) >>>=20 =2D-=20 Jorge Godoy <jg...@gm...> |
From: Hartmut G. <h.g...@go...> - 2006-10-24 10:31:01
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jorge Godoy schrieb: > Change the order. > >>>> s =3D u'ol=E1' >>>> s +=3D 'mundo' >>>> s > u'ol\xe1mundo' >>>> s =3D u'leite com ' >>>> s +=3D 'caf=E9' > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128) This is correct, since you tell python to add a Unicode-String to an Asc-String: s +=3D 'caf=E9' is the same as s =3D s + 'caf=E9' BTW: Using string-concationation is bad habbit, since it's slow. One should prever fthe % operator. - -- Sch=F6nen Gru=DF - Regards Hartmut Goebel | Hartmut Goebel | IT-Security -- effizient | | h.g...@go... | www.goebel-consult.de | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with Mandriva - http://enigmail.mozdev.org iQEVAwUBRT3rXczajR0mSa83AQJqjAf/QGcG42ieweARGPtDF2FN30AXrBHjLm9k 5Aw1qs2tFHF+jZTh/HZT12eQ5r39KwMJ9JU+th3OtHX7UvFU1Vbecjt/hEJp9hdX pT26W0oeA7nWqKX7F9yg1FG+x1ESTgKvkdw60mnJfOJsWrjJpX2S1vVssYceMvKB XonsUmOdWvHk35fNIioEk+r7XAmclSzqaNAFKqIy038pF2lIzihKusuIxXiXIypq nFoaJamjDxbFPYrMAOOvhUlhi+7NEImCPf5XNPDTLMB7EUkAdwl0X6KbCrb2MZ+s e6jP0UIzhr1NYkavuzFl2m3d9ImMvCFqECVgZjKB3JZnNv6yDcYezQ=3D=3D =3DcoxN -----END PGP SIGNATURE----- |
From: sophana <so...@zi...> - 2006-10-24 10:51:59
|
Hartmut Goebel a =E9crit : > Jorge Godoy schrieb: > > >> Change the order. > >> > >>>>> s =3D u'ol=E1' > >>>>> s +=3D 'mundo' > >>>>> s > >> u'ol\xe1mundo' > >>>>> s =3D u'leite com ' > >>>>> s +=3D 'caf=E9' > >> Traceback (most recent call last): > >> File "<stdin>", line 1, in ? > >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > 3: ordinal not in range(128) > This is indeed the typical very annoying feature I was talking about. In web application, these strings with unicode inside come from the web requests. Why isn't the string class simply replaced by the unicode class? Why 2 classes? > This is correct, since you tell python to add a Unicode-String to an > Asc-String: > s +=3D 'caf=E9' > is the same as > s =3D s + 'caf=E9' > Ok, but why is the right string encoded into ascii and not into the same encoding as the left unicode string? Isn't the +=3D operator an unicode method? > BTW: Using string-concationation is bad habbit, since it's slow. One > should prever fthe % operator. > Why would % operator would be faster than string concatenation? There is much less work to do! |
From: Oleg B. <ph...@ph...> - 2006-10-24 11:15:11
|
On Tue, Oct 24, 2006 at 12:41:24PM +0200, sophana wrote: > Why isn't the string class simply replaced by the unicode > class? Why 2 classes? To not break existing programs. Python 3000 will have only unicode strings. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Dan P. <da...@ag...> - 2006-10-24 11:15:58
|
On Tuesday 24 October 2006 13:41, sophana wrote: > > This is correct, since you tell python to add a Unicode-String to an > > Asc-String: > > s +=3D 'caf=E9' > > is the same as > > s =3D s + 'caf=E9' > > Ok, but why is the right string encoded into ascii and not into the > same encoding as the left unicode string? > Isn't the +=3D operator an unicode method? You seem to be very confused about this. The string on the right doesn't=20 need to be encoded to unicode, it needs to be decoded from whatever=20 encoding it has back to unicode. Since you haven't specified any, it=20 automatically picks ASCII. The unicode string on the left doesn't have=20 any encoding attached to be used as you wish, simply because a unicode=20 string doesn't have an encoding attribute. An encoding is used to convert= =20 unicode to a string and the resulting string will have a given encoding. --=20 Dan |
From: <pk...@gm...> - 2006-10-24 11:35:54
|
sophana schrieb: >>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position >> 3: ordinal not in range(128) >> > This is indeed the typical very annoying feature I was talking about. I= n > web application, these strings with unicode inside come from the web > requests. Why isn't the string class simply replaced by the unicode > class? Why 2 classes? Legacy. But please!!! There are NO "strings with unicode inside". >> This is correct, since you tell python to add a Unicode-String to an >> Asc-String: There are NO "unicode strings" >> s +=3D 'caf=E9' >> is the same as >> s =3D s + 'caf=E9' >> > Ok, but why is the right string encoded into ascii and not into the sam= e > encoding as the left unicode string? There are NO "unicode strings". Unicode objects don't have encodings. > Isn't the +=3D operator an unicode method? I guess unicode objects have __iadd__ defined, but the point here is: unicode objects and byte strings are two different data types! If you concat them, one of them gets encoded/decoded with the default encoding which happens to be ascii by default. If you have characters not in ascii, conversion will fail. The solution is: Know where you have byte strings and where you have unicode objects. If you have a form, parameters will be byte strings encoded with the encoding of the html page. The database stores byte strings and has an encoding as well. As a general rule you should use unicode objects in your program and know the boundaries where data comes in (forms) or gets serialized (database). Encode/decode at those boundaries and you are safe. cheers Paul |
From: Peter B. <pet...@14...> - 2006-10-24 22:31:44
Attachments:
request.py
|
> The solution is: Know where you have byte strings and where you have > unicode objects. If you have a form, parameters will be byte strings > encoded with the encoding of the html page. The database stores byte > strings and has an encoding as well. As a general rule you should use > unicode objects in your program and know the boundaries where data comes > in (forms) or gets serialized (database). Encode/decode at those > boundaries and you are safe. > Just for interest, here is how I do this encoding/decoding for CGI input. There are two encodings defined, the user's preferred encoding (as sent by the browser in the HTTP_ACCEPT_CHARSET header) and the application encoding (as used by the database). I use codecs.EncodedFile (with the preferred and application encodings reversed) to encode the output before it's sent to the browser. I hope this helps others in understanding how encoding/decoding works with web applications, it took me a while to figure it out! Cheers Peter |
From: sophana <so...@zi...> - 2006-10-25 12:20:54
|
Peter Butler a =E9crit : > >> The solution is: Know where you have byte strings and where you have >> unicode objects. If you have a form, parameters will be byte strings >> encoded with the encoding of the html page. The database stores byte >> strings and has an encoding as well. As a general rule you should use >> unicode objects in your program and know the boundaries where data com= es >> in (forms) or gets serialized (database). Encode/decode at those >> boundaries and you are safe. >> =20 > Just for interest, here is how I do this encoding/decoding for CGI > input. There are two encodings defined, the user's preferred encoding > (as sent by the browser in the HTTP_ACCEPT_CHARSET header) and the > application encoding (as used by the database). I use > codecs.EncodedFile (with the preferred and application encodings > reversed) to encode the output before it's sent to the browser. I > hope this helps others in understanding how encoding/decoding works > with web applications, it took me a while to figure it out! Do you think it's worth bothering about the user browser capability? UTF8 is supported by most browsers. I specify (force) my pages encoding and hope that it will be accepted (and I think it is...) The setdefaultencoding method is the simplest. I don't need portability (it is enough portable for me...) |
From: Peter B. <pet...@14...> - 2006-10-25 21:42:56
|
> Do you think it's worth bothering about the user browser capability? > UTF8 is supported by most browsers. > I think it's definitely worth bothering for user input, if the user has set their browser encoding to Big5 and the app is trying to decode using UTF-8 then the input probably won't make much sense. > I specify (force) my pages encoding and hope that it will be accepted > (and I think it is...) > I'd rather not gamble with that, and it's easy enough to implement so that it is likely to work (see my previous message). My view is that if for some reason the browser doesn't accept UTF-8 (not likely, but given the number of buggy browsers out there still possible) then the user is not likely to complain, they will just go to another site. > The setdefaultencoding method is the simplest. I don't need portability > (it is enough portable for me...) > I think you will regret that decision one day, but it's up to you! Cheers Peter |
From: Dan P. <da...@ag...> - 2006-10-24 11:06:24
|
On Tuesday 24 October 2006 13:30, Hartmut Goebel wrote: > BTW: Using string-concationation is bad habbit, since it's slow. One > should prever fthe % operator. That is a common myth. People should test things themselves instead of believing everything that is written on the net. Here are some test results that contradict the above statement in any scenario I tried. Results were obtained by repeating each of the string operations displayed below for 1000000 times on each test: time = 0.55 sec; rate = 1806698 requests/sec; s = 'foo ' + s2 time = 0.77 sec; rate = 1305975 requests/sec; s = 'foo %s' % s2 time = 0.69 sec; rate = 1440251 requests/sec; s = 'foo ' + s2 + ' bar' time = 0.73 sec; rate = 1375230 requests/sec; s = 'foo %s bar' % s2 time = 0.69 sec; rate = 1455167 requests/sec; s = 'foo ' + s2 + s3 time = 0.97 sec; rate = 1028526 requests/sec; s = 'foo %s %s' % (s2, s3) time = 1.00 sec; rate = 1000144 requests/sec; s = 'foo ' + s2 + s3 + s4 time = 1.15 sec; rate = 867080 requests/sec; s = 'foo %s %s %s' % (s2, s3, s4) As you can see string concatenation can be more than 30% faster than the % operator. The 3 string concatenation was even faster than the % operator with a single argument. -- Dan |
From: Ronald O. <ron...@ma...> - 2006-10-24 11:25:39
Attachments:
smime.p7s
|
On Oct 24, 2006, at 12:16 PM, Jorge Godoy wrote: > Ronald Oussoren <ron...@ma...> writes: > >> On Oct 23, 2006, at 10:54 PM, sophana wrote: >> >>> >>> I'm still suffering on the python bug when you add a string to an >>> unicode with +=3D, the string is encoded into ascii. I still don't >>> understand why python didn't merge unicode and strings. >> >> Could you give an example? My understanding of what you write: >> >> Python 2.4.4 (#1, Oct 18 2006, 10:34:39) >> [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin >> Type "help", "copyright", "credits" or "license" for more =20 >> information. >> .>> s =3D u'hello' >> .>> s +=3D 'world' >> .>> s >> u'helloworld' >> .>> > > Change the order. > >>>> s =3D u'ol=E1' >>>> s +=3D 'mundo' >>>> s > u'ol\xe1mundo' >>>> s =3D u'leite com ' >>>> s +=3D 'caf=E9' > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in =20 > position 3: ordinal not in range(128) That's not a bug, but a feature. Python assumes that str() instances =20 are in the ascii encoding unless explicitly told otherwise. That's =20 why you should always use explicit conversions between str and unicode. AFAIK the best idiom for dealing with unicode is to convert all text =20 to unicode as soon as it enters you application (where you still know =20= the intended encoding) and convert it back to plain text when it =20 leaves again. That way the core of your application doesn't have to =20 worry about mixing unicode and str. Ronald |
From: Ronald O. <ron...@ma...> - 2006-10-24 11:37:01
Attachments:
smime.p7s
|
On Oct 24, 2006, at 12:41 PM, sophana wrote: > Hartmut Goebel a =E9crit : >> Jorge Godoy schrieb: >> >>>> Change the order. >>>> >>>>>>> s =3D u'ol=E1' >>>>>>> s +=3D 'mundo' >>>>>>> s >>>> u'ol\xe1mundo' >>>>>>> s =3D u'leite com ' >>>>>>> s +=3D 'caf=E9' >>>> Traceback (most recent call last): >>>> File "<stdin>", line 1, in ? >>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in =20 >>>> position >> 3: ordinal not in range(128) >> > This is indeed the typical very annoying feature I was talking =20 > about. In > web application, these strings with unicode inside come from the web > requests. Why isn't the string class simply replaced by the unicode > class? Why 2 classes? There are currently two classes because early versions of python =20 didn't know about Unicode. In Python 3.x strings will be unicode and =20 there will be a seperate type for dealing with 8-bit data. >> This is correct, since you tell python to add a Unicode-String to an >> Asc-String: >> s +=3D 'caf=E9' >> is the same as >> s =3D s + 'caf=E9' >> > Ok, but why is the right string encoded into ascii and not into the =20= > same > encoding as the left unicode string? > Isn't the +=3D operator an unicode method? Unicode strings don't have an encoding. 8-bit strings (the str type) =20 do have an encoding, but that encoding is implied by the environment =20 and is not an attribute of the string. That's why python uses one =20 (more or less arbitrary) encoding for implicit conversions. If you =20 know better (because the protocol you use specifies that text is in =20 UTF-8 format, or you know your string constants are encoded in =20 koi-8, ...) you should do an explicit conversion instead of relying =20 on the default conversion. Ronald |
From: Jorge G. <jg...@gm...> - 2006-10-24 12:25:26
|
sophana <so...@zi...> writes: > This is indeed the typical very annoying feature I was talking about. In > web application, these strings with unicode inside come from the web > requests. Why isn't the string class simply replaced by the unicode > class? Why 2 classes? In the future there will be only Unicode. For now there are two classes. Unfortunately things that are born in the US tipically only receive some kind of I18Nization later, not from start. :-) That I know of just Java got Unicode from beginning. > Ok, but why is the right string encoded into ascii and not into the same > encoding as the left unicode string? Because that's what the default encoding for Python. If I don't say in what encoding it is, then it will process it as the default encoding and it is ASCII. > Isn't the += operator an unicode method? No. It is a BaseString method. > Why would % operator would be faster than string concatenation? There is > much less work to do! Do you really think so? And I believe he was talking about the join method. -- Jorge Godoy <jg...@gm...> |