How does NP++ detect encoding of a file?

Anonymous
2010-08-24
2012-11-13
  • Anonymous - 2010-08-24

    How does NP++ (or other editors) detect the actual encoding (ANSI, UTF-8, UTF-8 with BOM, UCS-16 LE,…) of a file?
    When it loads a file it is at first just a stream of bytes.
    Are there a kind of encoding type bits in the first byte of a file?
    Or does NP++ really read at first the full content then intrepret the the content and make a guess?

    Is such a guess always correct?

    The method of detection must be compatible with other editors because they should be able to read a file content written by NP++.

    Thomas

     
  • cchris

    cchris - 2010-08-24

    If the file has a BOM, NP++ detects it and knows about the encoding.
    If the file is HTML or XML, the encoding is read from the first line of the file.
    Otherwise, NP++ takes a guess between UCS-2LE, UCS2-BE and ANSI. You cannot make a difference between a file encoded in UTF-8 without BOM and a file in ANSI with plenty of high ASCII characters.

    CChris

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks