Pasting UTF-8 from html page to text in UTF-8

  • Pavel

    Hello, advice needed.
    I tried to copy and paste from (Firefox character encoding Unicode-UTF-8). I am just interested in pasting the pronunciation transcription / prəˌnʌnsiˈeɪʃn/ like this, here it displays correctly, but when I paste into notepad++ text file with UTF-8 encoding, it gets displayed wrong.
    Where is the problem? Thanks.

  • cchris

    How is it displayed wrong?
    - With squares, indicationg your font needs to be changed to properly display the expected glyphs?
    - with HTML entities?
    - Does Edit -> Clipboard Operations -> Paste HTML help?


  • Pavel

    It is displayed with squares. I also tried to run the latest version of npp 5.9.8, normally I use 5.8.2, but the result is the same.
    Also tried to change default Courier font via Style Configurator - GLobal Styles to Times New Roman; firefox is not very revealing which font is uses - Tools - Options - Content shows Times New Roman as default font; though pages can use their own fonts.
    Using Edit - Paste Special - Paste HTML Contents produces the following
    <!-StartFragment-><span class="i">prəˌnʌnsiˈeɪʃn</span><span class="z"></span><!-EndFragment->
    It is funny when pasted here it is displayed correctly, yet incorrectly with squares in npp.
    I am at my wits' end.

  • Marc Kupper
    Marc Kupper

    The characters you copy/pasted are not in the default font that npp uses for display and so they end up being displayed with the square boxes. The file though though contains the correct UTF-8.  In plain ASCII we have


    You would need to find a font that has those Unicode characters and then configure npp to use that font.  You do this in Settings / Style Configurator and here you will see a very cool thing about npp.  Bring up a short UTF-8 test.htm file that has prəˌnʌnsiˈeɪʃn in it. Start Settings / Style Configurator and in the far left pane select HTML. Now select the Font name field in the far right and scroll it with the up/down arrow keys. You can watch the font getting applied instantly as you scroll through the list. On my system only the "MS PGothic" and "MS UI Gothic" fonts displayed the example text correctly.  I'm not sure why applications such as Firefox were able to display the example text with a much wider variety of fonts.

  • Pavel

    Thanks a lot, the procedure works perfect, though I had to enable global font, problem resolved. Three fonts show the characters, MS PGothic, MS UI Gothic and Lucida Sans Unicode.
    Finally, I printed the web page in Firefox using PDFCreator, opened the output pdf file in foxitreader and in properties found embedded fonts, Lucida Sans Unicode was used by Firefox for IPA characters in pronunciation.