#482 Including Unicode by copy/paste gives invalid UTF-8

0.7.10
open
nobody
5
2009-11-21
2009-08-18
Karl Pentzlin
No

To test the Soft Hyphen in an HTML application, I created a string "test-" (the "-" denotes the Soft Hyphen (SHY, Unicode U+00AD) with BabelMap 5.1.0.5 from http://www.babelstone.co.uk/Software/BabelMap.html , and copied it to the clipboard by using the Copy button in that application. Then, I opened my KompoZer 0.7.10, where I have preset "Character Code: UTF-8" in the New Page settings. I included the "test-" several times (by hitting Ctrl-V) into an empty page and saved the latter, giving it a simple title "SHY test" on request. Analyzing the generated code with a hex editor, I see a correct header containing "content="text/html; charset=UTF-8"
but a binary 0xAD at the places where the UTF-8 encoding of the SHY had to occur. This is invalid UTF-8 and causes object replacement characters to be displayed by Firefox and similar behaviour by other applications.

Discussion

  • Karl Pentzlin
    Karl Pentzlin
    2009-08-18

    The HTML file generated by KompoZer

     
  • Setting group to 0.7.10

     
    • milestone: --> 0.7.10