Using version 0.92 I'm hitting what seems to be a serious issue with Unicode support. When using the unicode="latin1" keyword in a database connection I'm seeing two different problems:
1 - Some data selected from the database is returned as a raw string, and some as unicode For example a string containing '\x92' is brought back a raw string, whereas data containing '\xfb' is brought back as Unicode. Given that they are both invalid ASCII shouldn't they both be brought back as Unicode strings?
2 - If the conversion from Unicode to latin1 fails I get back an exception. It would be helpful if you could specify in the constructor the error policy ('strict', 'ignore', or 'replace') that you want to use.
Does anyone know how to tackle #1? It looks like I'm going to have to stop using the Unicode keyword and do the work myself.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-05-10
An update on what's going on here: It seems that the Unicode conversion is only done for columns of type VARCHAR, not type LONGTEXT.
This sounds like a bug to me, anyone else think so?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In MySQL, LONGTEXT is exactly the same datatype as BLOB (or maybe LONGBLOB). Because of this, those columns are treated as BLOBs. If you don't use BLOBs, you can update your type converter dictionary so that those columns are returned as unicode strings. Arguably, BLOBs should be returned as character arrays (for details, type in the interpreter: help("array") ), but it's imposible to please everyone in this case.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Using version 0.92 I'm hitting what seems to be a serious issue with Unicode support. When using the unicode="latin1" keyword in a database connection I'm seeing two different problems:
1 - Some data selected from the database is returned as a raw string, and some as unicode For example a string containing '\x92' is brought back a raw string, whereas data containing '\xfb' is brought back as Unicode. Given that they are both invalid ASCII shouldn't they both be brought back as Unicode strings?
2 - If the conversion from Unicode to latin1 fails I get back an exception. It would be helpful if you could specify in the constructor the error policy ('strict', 'ignore', or 'replace') that you want to use.
Does anyone know how to tackle #1? It looks like I'm going to have to stop using the Unicode keyword and do the work myself.
An update on what's going on here: It seems that the Unicode conversion is only done for columns of type VARCHAR, not type LONGTEXT.
This sounds like a bug to me, anyone else think so?
In MySQL, LONGTEXT is exactly the same datatype as BLOB (or maybe LONGBLOB). Because of this, those columns are treated as BLOBs. If you don't use BLOBs, you can update your type converter dictionary so that those columns are returned as unicode strings. Arguably, BLOBs should be returned as character arrays (for details, type in the interpreter: help("array") ), but it's imposible to please everyone in this case.