German umlauts do not work in that WKS file. Moreover I found out that this file has an additional sheet (?) or at least something which not shown (the word "Wachstum" is shown in my HEX-Editor but not visible anywhere)
For example: "Verkõufe Gesamt" (cell B5) should be "Verkäufe Gesamt"
Tested with LibreOffice Version: 4.3.2.2
Build-ID: edfb5295ba211bd31ad47d0bad0118690f76407d
Anonymous
Hello,
normally, the first problem must be fixed in libwps-0.3.1 by https://sourceforge.net/p/libwps/code/ci/41dfedf3d25025fe1dc91e7a01d83a54d3259476 ( see attachment tt.ods ), i.e. this file uses LICS encoding which is now retrieved, however the Windows DOS codepage does not appear in the file so it still uses a default one.
Note: this file does contain a chart with name Wachstum, but actually, we do not reconstruct charts which appear in the file :-~
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Hi, looking at the source code of libwps_tools_win.cpp
0.4.x, the conversion of LICS characters to Unicode
appears to be partially broken:
In Font::LICSunicode() the code converts all codes >= 0x80
according to a 128 byte translation table named LICS[],
and the resulting byte value is then stuffed into another
conversion routine named unicode(). unicode(), however,
erroneously carries out an OEM codepage to Unicode conversion
(with the OEM codepage being a varible input). This does
not make sense, as the conversion LICS->Unicode is fixed
and does not depend on an OEM codepage.
Further, if I assume codepage 850 (as it was still hardwired
in the older 0.3.1 issue of the file), carrying out the
conversion through codepage 850 results in the correct
Unicode values for some of the LICS codes, but not for all.
For illustration, these are the errors in the LICS code
range 0xD0 to 0xFE:
0xD7: code results in U+00D7, but should result in U+0152.
0xDD: code results in U+00DD, but should result in U+0178.
0xDE: code results in U+00FE, but should resukt in U+00DE.
0xE4: code results in U+00F6, but should result in U+00E4.
0xF7: code results in U+00F7, but should result in U+0153.
0xFE: code results in U+00DE, but should result in U+00FE.
Please compare your conversion with the LICS translation
documented in the article "Lotus International Character Set"
in the English Wikipedia, which is based on a printed
LICS table in a HP 95LX manual. While some of the codes
in the range 0x80 to 0xBF are still unknown or are
ambiguous, the conversion of codes in the range 0xC0
to 0xFE to Unicode is clearly unambiguous, so the Wikipedia
article can be used as a reference for these codes.
Hope this helps to improve the Lotus 1-2-3 file importer.
Matthias
Hello,
currently this code takes inspiration from the comments of https://bugs.documentfoundation.org/show_bug.cgi?id=87222 (a), so if you do
or choose Western Europe/OS2-437/US in LibreOffice, the result must be better.
Notes:
Last edit: alonso laurent 2016-12-12
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Hi Alonso,
this won't work as LICS has nothing to do with codepage 437. As far as I know. Lotus 1-2-3 Release 1.x supported "ASCII" files only (but didn't strip off bit 7, so in reality users undocumentedly used 8-bit OEM codepages). Lotus 1-2-3 release 2.0 introduced LICS and since 2.01 it allowed the user to choose between "ASCII" and LICS in setup. I don't know if there is a tag or attribute inside the file format to indicate which mode was chosen or if the user just had to set up the program accordingly. Perhaps this info is also associated with the various filename extensions - this would need some research (BTW. Japanese versions of Lotus for the NEC PC-98 used WJ1/WJ2/WJ3/WJ4 file extensions instead of WKS/WK1/WP2/WK3/WK4 - probably a file format very similar, but obviously with some detail differences as well). Apparently, various third-party spreadsheets did not support LICS but just continued to use "ASCII", which actually means that they used whatever codepage was selected as 8-bit OEM codepage by the underlying operating system (often 437 or 850, but a whole bunch of other OEM codepages exist). With Lotus 1-2-3 release 3.x they switched to yet another character set, and encoding, LMBCS, also discussed in the English WP now (although the discussion is not complete at present).
Hope it helps.
Greetings,
Matthias
Hello, I have changed the code to do the LICS conversion in one pass :
https://sourceforge.net/p/libwps/code/ci/830ac17bddd0c39faa8f51b951c6e112a08fd6ed/
Note: do you have some files WJ1-WJ4 files or some files which use some LMBCS encoding ?
Hello,
concerning the chart, I add basic code to retrieve it in a separate sheet (it is not perfect but it may be sufficient...)