#482 Including Unicode by copy/paste gives invalid UTF-8


To test the Soft Hyphen in an HTML application, I created a string "test-" (the "-" denotes the Soft Hyphen (SHY, Unicode U+00AD) with BabelMap from http://www.babelstone.co.uk/Software/BabelMap.html , and copied it to the clipboard by using the Copy button in that application. Then, I opened my KompoZer 0.7.10, where I have preset "Character Code: UTF-8" in the New Page settings. I included the "test-" several times (by hitting Ctrl-V) into an empty page and saved the latter, giving it a simple title "SHY test" on request. Analyzing the generated code with a hex editor, I see a correct header containing "content="text/html; charset=UTF-8"
but a binary 0xAD at the places where the UTF-8 encoding of the SHY had to occur. This is invalid UTF-8 and causes object replacement characters to be displayed by Firefox and similar behaviour by other applications.


  • Frédéric Chateaux

    Setting group to 0.7.10

  • Frédéric Chateaux

    • milestone: --> 0.7.10

Log in to post a comment.