How does NP++ detect encoding of a file?

2. Help
thoazu
2010-08-24
2012-11-13
  • thoazu
    thoazu
    2010-08-24

    How does NP++ (or other editors) detect the actual encoding (ANSI, UTF-8, UTF-8 with BOM, UCS-16 LE,…) of a file?
    When it loads a file it is at first just a stream of bytes.
    Are there a kind of encoding type bits in the first byte of a file?
    Or does NP++ really read at first the full content then intrepret the the content and make a guess?

    Is such a guess always correct?

    The method of detection must be compatible with other editors because they should be able to read a file content written by NP++.

    Thomas

     
  • cchris
    cchris
    2010-08-24

    If the file has a BOM, NP++ detects it and knows about the encoding.
    If the file is HTML or XML, the encoding is read from the first line of the file.
    Otherwise, NP++ takes a guess between UCS-2LE, UCS2-BE and ANSI. You cannot make a difference between a file encoded in UTF-8 without BOM and a file in ANSI with plenty of high ASCII characters.

    CChris