From: Alexander 'L. B. <le...@st...> - 2008-12-29 16:09:14
|
Hi! I have a problem using latin1 encoded files with tora 2.0: When opening a file, the environment seems to be considered. I.e. opening a iso8859-1 encoded file will work with env LANG=en_US tora or env LANG=en_US.iso88591 tora Vice versa, opening a UTF8-encoded file will work fine with env LANG=en_US.utf8 tora But when saving an SQL-file the encoding that's used is always UTF8. So simply opening an ISO8859-1 encoded file with a locale setting of LANG=en_US and pressing the Save-Button will result in tora recoding the file to UTF8. Cheers, --leo -- e-mail ::: Leo.Bergolth (at) wu-wien.ac.at fax ::: +43-1-31336-906050 location ::: Computer Center | Vienna University of Economics | Austria |
From: Alexander 'L. B. <le...@st...> - 2009-01-05 17:00:45
Attachments:
tora-2.0.0-encoding.patch
|
Hi! On 12/29/2008 05:09 PM, Alexander 'Leo' Bergolth wrote: > I have a problem using latin1 encoded files with tora 2.0: > > When opening a file, the environment seems to be considered. [...] > But when saving an SQL-file the encoding that's used is always UTF8. > > So simply opening an ISO8859-1 encoded file with a locale setting of > LANG=en_US and pressing the Save-Button will result in tora recoding the > file to UTF8. I've taken a look at the way tora handles encodings. The main problem is the asymmetric way toReadFile() and toWriteFile() work: toReadFile() reads in a file and decodes it according to the users locale settings into a QString. However the overloaded version of toWriteFile() that takes a QString always uses Utf8 to encode the file! Besides, there are some places where data is read and decoded according to the locale (toReadFile()) and then encoded back to Utf8 in the same line. There is also some casting between QString and QByteArray that does implicit de-/encoding. I've attached a patch that attempts to address some of those problems: toReadFile() and toWriteFile() are now symmetric and both consider the locale settings. There's a new toReadFileB() that reads a file into a QByteArray without decoding it. Decoding it to a QString can be done later e.g. via data.toUtf8(). This is useful for handling binary data and e.g. configuration data, since IMHO this should always be encoded in a well-defined encoding. I've also reviewed the classes that use those two functions: toworksheet.cpp: now uses the current locale settings both for reading and writing files toproject.cpp: now uses the current locale settings both for reading and writing files tomarkedtext.cpp: now uses the current locale settings both for reading and writing files tochartmanager.cpp: now uses the current locale settings both for reading and writing files toconfiguration.cpp: uses UTF8 for reading and writing files (since it's configuration data) tosql.cpp: uses Utf8 for reading and writing sql dictionaries. tohelp.cpp: uses locale settings for reading HTML files. (Maybe this should be changed?) migratetool/tora3.cpp: uses Utf8 for reading files Please review the patch. It's not thoroughly tested but it fixes my problem with broken encodings in worksheets. Cheers, --leo -- e-mail ::: Leo.Bergolth (at) wu-wien.ac.at fax ::: +43-1-31336-906050 location ::: Computer Center | Vienna University of Economics | Austria |
From: Michael M. <mic...@re...> - 2009-04-17 09:41:47
|
Alexander 'Leo' Bergolth wrote: % Hi! Hi Leo, I've come across your file encoding patch and noticed that it hasn't been applied yet :(. So I've reviewed it and applied. Tkanks for you contribution and sorry it took so long. Michael % On 12/29/2008 05:09 PM, Alexander 'Leo' Bergolth wrote: % > I have a problem using latin1 encoded files with tora 2.0: % > % > When opening a file, the environment seems to be considered. % [...] % > But when saving an SQL-file the encoding that's used is always UTF8. % > % > So simply opening an ISO8859-1 encoded file with a locale setting of % > LANG=en_US and pressing the Save-Button will result in tora recoding the % > file to UTF8. % % I've taken a look at the way tora handles encodings. % % The main problem is the asymmetric way toReadFile() and toWriteFile() work: % % toReadFile() reads in a file and decodes it according to the users % locale settings into a QString. % % However the overloaded version of toWriteFile() that takes a QString % always uses Utf8 to encode the file! % % Besides, there are some places where data is read and decoded according % to the locale (toReadFile()) and then encoded back to Utf8 in the same % line. There is also some casting between QString and QByteArray that % does implicit de-/encoding. % % I've attached a patch that attempts to address some of those problems: % toReadFile() and toWriteFile() are now symmetric and both consider the % locale settings. % There's a new toReadFileB() that reads a file into a QByteArray without % decoding it. Decoding it to a QString can be done later e.g. via % data.toUtf8(). This is useful for handling binary data and e.g. % configuration data, since IMHO this should always be encoded in a % well-defined encoding. % % I've also reviewed the classes that use those two functions: % % toworksheet.cpp: now uses the current locale settings both for reading % and writing files % % toproject.cpp: now uses the current locale settings both for reading % and writing files % % tomarkedtext.cpp: now uses the current locale settings both for reading % and writing files % % tochartmanager.cpp: now uses the current locale settings both for % reading and writing files % % toconfiguration.cpp: uses UTF8 for reading and writing files (since it's % configuration data) % % tosql.cpp: uses Utf8 for reading and writing sql dictionaries. % % tohelp.cpp: uses locale settings for reading HTML files. (Maybe this % should be changed?) % % migratetool/tora3.cpp: uses Utf8 for reading files % % % Please review the patch. It's not thoroughly tested but it fixes my % problem with broken encodings in worksheets. % % Cheers, % --leo % -- % e-mail ::: Leo.Bergolth (at) wu-wien.ac.at % fax ::: +43-1-31336-906050 % location ::: Computer Center | Vienna University of Economics | Austria -- Michael Mráka |