Notepad++ / Discussion / [READ ONLY] Help: Convert ASCII to UTF8. Is it possible?

Nobody/Anonymous - 2007-03-06

I mean, not just change encoding, because doing just that, doesn't really change characters, they become garbage. Windows XP notepad really does the converting. But does np++ have this function?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Michel Merlin - 2007-03-12
  
  ISO-8859-1 often better than UTF-8
  ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
  Posted on http://sourceforge.net/forum/forum.php?thread_id=1687592&forum_id=331754
  
  UTF-8 looks correct in US American texts - because they are mostly plain ASCII texts. But in European texts it garbages 2 chars around each European special character as soon as you try to source-edit the text in OE (Outlook Express), i.e. in 80% of the market. This is why you see UTF-8 mostly used in pure-ASCII text (as emails from American people and companies).
  
  UTF-8 (as Microsoft itself admitted) causes problems in conversions. Its goal is nice (representing all chars on earth with a single charset with convenience and reliability), but not reached yet; for now it looks more like an Esperanto-like attempt at speaking a single language; IMO it has the potential to achieve better, but has NOT reached yet the reliability needed for wide use.
  
  So, while waiting for a better solution (which may take long: this question is more complex than what some expect), I think the best is to chose *more accurate and appropriate* charsets, while *in lower number*. You can't so far avoid using several different charsets, for Western languages, Central Europe, Korean, Japan, Chinese, Arab, ...
  
  To write all Western Languages (English, Spanish, Portuguese, German, French), use "Western European ISO" (ISO-8859-1), which is the only one identical to the beginning of the universal char table. Don't use ISO-8859-15, that differs, with little benefit (just the Euro Typographical Symbol, that you better replace with the FINANCIAL symbol "EUR", easily understood and written by any person or program or machine or printer in the world, from N.Y. to Bangkok to Paris to Dakar).
  
  If your friend handles his mail on http://mail.yahoo.com , your message, if encoded in UTF-8 will be returned with garbage; if in ISO-8859-15, the Euro Typographical Symbol will be replaced with the Currency typographical Symbol, making it false; and so on.
  
  Now this is for *email* (because email frequently gets edited by your reader); for *web pages*, UTF-8 is a good solution (provided you are careful while editing it).
  
  Details (don't mind heated comments posted before complete reading):
  
  1) On MS newsgroups, many messages, of which:
  
  From: Michel Merlin <michel.merlin@laposte.net>
  Newsgroup: news://msnews.microsoft.com/microsoft.public.outlookexpress.general
  Message: news://msnews.microsoft.com/u0Sfc6bOHHA.1248@TK2MSFTNGP02.phx.gbl
  Subject: OE can't edit HTML source of UTF-8 European messages
  Posted: Tue 16 Jan 2007 23:35:20 +0100 (22:35:20 GMT)
  
  2) On Sitepoint, http://www.sitepoint.com/forums/showthread.php?t=450442&page=2#post3250318 "Please post successful test of source-editing UTF-8 European HTML", posted Sun 21 Jan 2007 16:39:10 GMT (images explaining the garbage caused by UTF-8 when source-editing European text in OE)
  
  Versailles, Mon 12 Mar 2007 15:33:25 +0100
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nobody/Anonymous - 2007-08-28
    
    Thank you, very interesting information. However, is there a way to convert to utf8 in npp?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nobody/Anonymous - 2007-11-22
      
      [quote: michelmerlin] UTF-8 (as Microsoft itself admitted) causes problems in conversions. Its goal is nice (representing all chars on earth with a single charset with convenience and reliability), but not reached yet; for now it looks more like an Esperanto-like attempt at speaking a single language; IMO it has the potential to achieve better, but has NOT reached yet the reliability needed for wide use [/quote]
      
      I'm also interested in a "Convert to UTF-8" function (UTF-8 or whatever). I don't see how can be "Encode to UTF-8" useful.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Don HO - 2007-11-22
        
        > I'm also interested in a "Convert to UTF-8" function (UTF-8 or whatever). I don't see how can be "Encode to UTF-8" useful.
        I'll consider it.
        
        In the meantime, you can do following step to get what you want :
        1. Ctrl+A
        2. Ctrl+X
        3. Menu Format->Encode in UTF-8
        4. Ctrl+V
        5. Ctrl+S
        
        You can record this sequence as Macro for the next use.
        
        Don
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nobody/Anonymous - 2007-11-22
        
        I forgot to write my comment to michelmerlin: "I can't beleave what read."
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nobody/Anonymous - 2007-09-26
  
  Opened a file with various editors and saw a NUL before every character! Even with binary editor NUL everywhere. "View whitespace" WAS turned off. Turns out file was saved as Unicode with MS Notepad. Re-saved as ANSI & no more NULs.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Convert ASCII to UTF8. Is it possible?

Notepad++ project is moving to GitHub:

Forums

Help

Convert ASCII to UTF8. Is it possible?

Convert ASCII to UTF8. Is it possible?

Notepad++ project is moving to GitHub:

Forums

Help

Convert ASCII to UTF8. Is it possible? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Convert ASCII to UTF8. Is it possible?