Menu

Convert ASCII to UTF8. Is it possible?

2007-03-06
2012-11-13
  • Nobody/Anonymous

    I mean, not just change encoding, because doing just that, doesn't really change characters, they become garbage. Windows XP notepad really does the converting. But does np++ have this function?

     
    • Michel Merlin

      Michel Merlin - 2007-03-12

      ISO-8859-1 often better than UTF-8
      ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
      Posted on http://sourceforge.net/forum/forum.php?thread_id=1687592&forum_id=331754

      UTF-8 looks correct in US American texts - because they are mostly plain ASCII texts. But in European texts it garbages 2 chars around each European special character as soon as you try to source-edit the text in OE (Outlook Express), i.e. in 80% of the market. This is why you see UTF-8 mostly used in pure-ASCII text (as emails from American people and companies).

      UTF-8 (as Microsoft itself admitted) causes problems in conversions. Its goal is nice (representing all chars on earth with a single charset with convenience and reliability), but not reached yet; for now it looks more like an Esperanto-like attempt at speaking a single language; IMO it has the potential to achieve better, but has NOT reached yet the reliability needed for wide use.

      So, while waiting for a better solution (which may take long: this question is more complex than what some expect), I think the best is to chose *more accurate and appropriate* charsets, while *in lower number*. You can't so far avoid using several different charsets, for Western languages, Central Europe, Korean, Japan, Chinese, Arab, ...

      To write all Western Languages (English, Spanish, Portuguese, German, French), use "Western European ISO" (ISO-8859-1), which is the only one identical to the beginning of the universal char table. Don't use ISO-8859-15, that differs, with little benefit (just the Euro Typographical Symbol, that you better replace with the FINANCIAL symbol "EUR", easily understood and written by any person or program or machine or printer in the world, from N.Y. to Bangkok to Paris to Dakar).

      If your friend handles his mail on http://mail.yahoo.com , your message, if encoded in UTF-8 will be returned with garbage; if in ISO-8859-15, the Euro Typographical Symbol will be replaced with the Currency typographical Symbol, making it false; and so on.

      Now this is for *email* (because email frequently gets edited by your reader); for *web pages*, UTF-8 is a good solution (provided you are careful while editing it).

      Details (don't mind heated comments posted before complete reading):

      1) On MS newsgroups, many messages, of which:

      From: Michel Merlin <michel.merlin@laposte.net>
      Newsgroup: news://msnews.microsoft.com/microsoft.public.outlookexpress.general
      Message: news://msnews.microsoft.com/u0Sfc6bOHHA.1248@TK2MSFTNGP02.phx.gbl
      Subject: OE can't edit HTML source of UTF-8 European messages
      Posted: Tue 16 Jan 2007 23:35:20 +0100 (22:35:20 GMT)

      2) On Sitepoint, http://www.sitepoint.com/forums/showthread.php?t=450442&page=2#post3250318 "Please post successful test of source-editing UTF-8 European HTML", posted Sun 21 Jan 2007 16:39:10 GMT (images explaining the garbage caused by UTF-8 when source-editing European text in OE)

      Versailles, Mon 12 Mar 2007 15:33:25 +0100

       
      • Nobody/Anonymous

        Thank you, very interesting information. However, is there a way to convert to utf8 in npp?

         
        • Nobody/Anonymous

          [quote: michelmerlin] UTF-8 (as Microsoft itself admitted) causes problems in conversions. Its goal is nice (representing all chars on earth with a single charset with convenience and reliability), but not reached yet; for now it looks more like an Esperanto-like attempt at speaking a single language; IMO it has the potential to achieve better, but has NOT reached yet the reliability needed for wide use [/quote]

          I'm also interested in a "Convert to UTF-8" function (UTF-8 or whatever). I don't see how can be "Encode to UTF-8" useful.

           
          • Don HO

            Don HO - 2007-11-22

            > I'm also interested in a "Convert to UTF-8" function (UTF-8 or whatever). I don't see how can be "Encode to UTF-8" useful.
            I'll consider it.

            In the meantime, you can do following step to get what you want :
            1. Ctrl+A
            2. Ctrl+X
            3. Menu Format->Encode in UTF-8
            4. Ctrl+V
            5. Ctrl+S

            You can record this sequence as Macro for the next use.

            Don

             
          • Nobody/Anonymous

            I forgot to write my comment to michelmerlin: "I can't beleave what read."

             
    • Nobody/Anonymous

      Opened a file with various editors and saw a NUL before every character! Even with binary editor NUL everywhere. "View whitespace" WAS turned off. Turns out file was saved as Unicode with MS Notepad. Re-saved as ANSI & no more NULs.

       
MongoDB Logo MongoDB