Menu

#22 Wrong characters in WP5 Greek character set

untriaged
open
nobody
characters (1)
2023-05-19
2022-11-26
No

There seem to be at least two errors in the Greek WP5 character set, at character 8,38 and 8.39. The characters are correct in the Greek WP6 set. The details are here:

https://github.com/taviso/wpunix/issues/29#issuecomment-1328012872

Discussion

  • Edward Mendelson

    It's easy to test which is right. Export the Greek character set from WP5 (perhaps using LibreOffice to producde the output) and see if those characters match the unicode glyphs with the same name. You'll see that the two characters I mentioned are incorrect.

    The report on the change was made so that output from a commercial product called Printer Polyglott looked right; it seems that Printer Polyglott had mistakes that LibreOffice and modern Linux doesn't have.

     
  • Edward Mendelson

    I've done some further testing, and the change definitely should be reverted, because it substituted the correct characters with characters that would print in an obsolete program that got it wrong. Both Corel (in current WordPerfect for Windows) and Microsoft (with its WP import filter) use the original mapping that got changed incorrectly.

    Here is a WordPerfect for DOS 5.1 file with the four relevant characters:
    https://www.dropbox.com/s/hogzhfhx4ulm8wr/GREEKWP5.WP?dl=0

    Here is the same file opened and saved in WordPerfect for Windows:
    https://www.dropbox.com/s/ylcrvcgrm90co4s/GREEKWP5.wpd?dl=0

    Here is the WPDOS 5 file converted to DOCX by Word for Windows:
    https://www.dropbox.com/s/ovxx7m6dec8tdfb/GREEKWP5-fromWPWtoWord.docx?dl=0

    Here is the WPDOS5 file converted by WriterPerfect (with the wrong unicode symbols):
    https://www.dropbox.com/s/3s6un2mlnt4vavx/GreekWP5fromWriterPerfect.odt?dl=0

    This seems to be the same as the same file converted by LibreOffice (also with the wrong symbols):
    https://www.dropbox.com/s/co8l8njb7c2knw0/GREEKWP5fromLibreOffice.odt?dl=0

    Summary - and this applies only to the WP5 Greek Character set:
    WP character 8,38 should be 0x03a3
    WP character 8,39 should be 0x03c2
    WP character 8,54 should be 0x03ae
    WP character 8,65 should be 0x03f1

    The last character is obviously wrong in WriterPerfect and LibreOffice - it's a Greek rho with a breathing mark, not a variant of the Greek rho. But all of them are wrong. The change should be reverted, although it may need some manual work, as the line numbering as changed a lot since 2010.

     

    Last edit: Edward Mendelson 2023-04-17
  • Fridrich Strba

    Fridrich Strba - 2023-05-18

    I reverted the commit. Please verify whether this did not change a character that was correct to something wrong.

     
  • Edward Mendelson

    Thank you! One character is now incorrect, because the code before the old commit had an error that was later fixed. To clarify, the second of the three changed lines should NOT have the change from 0x03ae to 0x03a5 shown below:

    • 0x03a8, 0x03c8, 0x03a9, 0x03c9, 0x03ac, 0x03ad, 0x03ae, 0x03af, <--CORRECT
    • 0x03a8, 0x03c8, 0x03a9, 0x03c9, 0x03ac, 0x03ad, 0x03a5, 0x03af, <-- WRONG

    The other two changed lines now have the correct code. I built your latest commit under macOS and can confirm that only one character now needs to be changed.

     
  • Edward Mendelson

    I may have may things confusing; if so, I apologize. The latest commit doesn't correct line 808 of libwpd_internal.cpp. The line should read:

    0x03a8, 0x03c8, 0x03a9, 0x03c9, 0x03ac, 0x03ad, 0x03ae, 0x03af,

    Notice that the seventh code is 0x03ae, NOT 0x03a5 as it is now. This is the only thing that needs to be changed. Thank you for this!

     
    • Fridrich Strba

      Fridrich Strba - 2023-05-19

      Actually, I am the one confused. Most likely, I ran the "make astyle" with unsaved changes and reloaded the file afterwards. Now it should be ok.

       
  • Edward Mendelson

    This is correct now. Thank you. I think it can be applied to LibreOffice now...

     

Log in to post a comment.