Incorrect Encoding Saved

  • Turbo-Pascal

    Turbo-Pascal - 2014-04-10

    Hi and first thanks for the program.

    I have a Unicode text file that contains Unicode characters that are supported in Windows NTFS; by that I mean all the characters in the text file can be used within filenames on NTFS.

    When I open the text file in Notepad, everything is fine and dandy. If I open the file in Notepad++ the character encoding is lost on some of the characters. Instead of the characters I get boxes. Example of characters I use include:

    ⁅ ⁆
    ❝ ❞

    Is there a way to fix this issue?

    Thanks in advance


    THEVENOT Guy - 2014-04-11

    Hello Turbo Pascal,

    The two first characters, of your example, have Unicode code-points \x{2045} and \x{2046} and are part of the General Punctuation chart below :

    The two last characters have Unicode code-points \x{275d} and \x{275e} and are part of the Dingbats chart below :

    You can get all the Unicode characters from these two links :

    So, your Unicode text file is certainly saved, with an UNICODE encoding. To verify, just open Microsoft Notepad and click on the menu File - Save As... option. The encoding, indicated in the Save As... dialog, should be Unicode, Unicode big endian or UTF-8.

    Then, when you open this file in Notepad++, the encoding displayed, in the right part of the status bar of N++ should be , normally, UCS2 Little Endian, UCS2 Big Endian or UTF-8 !

    So, why your text displays boxes and not the right characters ?

    Well, with Microsoft Notepad, as long as the current file has an Unicode encoding, you'll see all the Unicode characters, even if the current font, used in Notepad, can't correctly display some of them !

    I don't know exactly ( and I didn't search why ! ) the mechanism, used by Microsoft, to achieve this behaviour !

    But, with Notepad++, you need to select, as current font, a font which can strictly display these Unicode characters !

    On my French computer, after copying your four characters, they are correctly displayed, if the current font is set to Arial Unicode MS, Lucida Sans Unicode or Tahoma, To do so, follow the method below :

    • Open the Style Configurator ( Setting - Style Configurator... )

    • Choose the Global Style, in the Language list

    • Choose the Default style, in the Style list

    • Choose one of the appropriate fonts, in the Font Style zone, on the right on the dialog

    See the attached picture Pascal.png, below !

    On the Web, you can also search some Unicode fonts, or get newer versions of your installed fonts, and install them on your operating system.

    As for me, I downloaded and installed, some time ago, the ARIALUNI.TTF font which can display 50377 different characters :-) Just note, that these Unicode fonts are, generally, quite big fonts ( 22,1 Mb for ARIALUNI.TTF )



    Last edit: THEVENOT Guy 2014-04-11
  • Turbo-Pascal

    Turbo-Pascal - 2014-04-11

    Hello Guy,

    That's a perfect and detailed answer. I am the one that should say cheers.

    For anyone else I am now using "Lucida Sans Unicode" an ugly font but it is monospaced which is a must have.

    Thanks Guy :)


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks