NSIS: Nullsoft Scriptable Install System / Bugs / #1167 Encoding info in nlf and nsh files

Anders - 2016-12-02

English is pure ASCII and will work the same on all PCs, Danish is not, it has the nordic characters so it needs to specify the encoding and cannot use '-'.

The codepage listed in the .nlf is used internally by the compiler and the generated ANSI installers but the language files in NSIS 3 are UTF-8 BOM for various reasons. The language files in NSIS 2 are basically treated as a binary file and the strings are copied as is to the generated installer but this makes it hard for people dealing with the files because they have to use the correct encoding in their editor and diffing tools and it is impossible for SVN and other things on the server side to know which encoding to use.

For languages that are Unicode-only, the compiler will complain if the .nlf file is not UTF-8 BOM.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

scootergrisen - 2016-12-02

Ok but why not use UTF for English anyway?
Maybe at some point you want to use some special characters that are not in ASCII.

So should my Danish file say the following?:

# Codepage - dash (-) means ANSI code page UTF-8 BOM

And do you have link to some page where it says what to use so i can remember?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anders - 2016-12-02
  
  The English file is UTF-8 already. All ASCII files are UTF-8 because ASCII is a subset of UTF-8! (We also want the BOM because then we know for sure that the file really is UTF-8 and the BOM is supported by most Windows based editors)
  
  The codepage line in the .nlf should be set to the Windows codepage required to display that particular language (and is used by the installer to filter out languages in the language selection dialog) and should never be "UTF-8" but as a translator you should only work with UTF-8 BOM files. It is our preferred encoding for text files and makes sure there are no conversion issues and everyone that views the file see the same content.
  
  You should not have to remember which encoding to use, if you just start with a NSIS 3 language file then your text editor should just detect the BOM at the start of the file and know that it is UTF-8 encoded text. If you use Notepad2 etc. it should say "UTF-8 BOM" or "UTF-8 Sig" in the statusbar.
  
  It is not a big deal for Danish because I can convert on my end and verify the result but for languages I cannot read it is hard to know if something is encoded correctly or not.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

scootergrisen - 2016-12-02

Are you telling me i need to have this:

# Codepage - dash (-) means ANSI code page 1252

Even though the files is saved with UTF-8 BOM encoding?
I just assumed it was the encoding to use when saving the file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anders - 2016-12-02
  
  Correct. NSIS 3 uses Unicode internally and if you choose to build a Ansi installer with "Unicode false" in your script then the NSIS compiler will convert from Unicode to the codepage listed in the .nlf (1252 for Danish) when it generates the installer.
  
  The compiler can in theory handle files encoded with the codepage listed in the .nlf but other tools cannot so we prefer UTF-8.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

scootergrisen - 2016-12-02

Maybe add commect about that this codepage will only be used for Ansi installer (Unicode false) in the translation files.

Maybe its just be that havent read the translation intructions if any correctly but if not already dont it would be nice to have this info on a page on the website for others to find and for me to find later when i forget.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anders - 2016-12-02

The only reason I even brought this up in my comment was because you specifically said "Let me know if there are any problems with encoding, newlines or whatever". We can handle codepage specific files but it is more work on our end and there is a chance that someone makes a encoding mistake. UTF-8 BOM makes it very clear to all programs what the encoding is but I realize it can be a bit confusing because there is a codepage field in the .nlf that you are supposed to fill in.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anders - 2016-12-26

labels: --> translation

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Encoding info in nlf and nsh files

Windows installer development tool

Group

Searches

Help

#1167 Encoding info in nlf and nsh files

Discussion