Menu

#1101 Unicode setting is likely to cause unexpected failures

3.0 Alpha Series
closed-wont-fix
nobody
None
5
2016-12-26
2014-09-08
Alastair
No

To compile a Unicode installer in 3.0b0, we use this compiler instruction:

Unicode true

However, there is a lot of room for unexpected behavior here. Imagine these two scenarios:

Unicode true
!include MyFuncs.nsh

vs.

!include MyFuncs.nsh
Unicode true

In the first case, when MyFuncs.nsh is included, NSIS_UNICODE is defined and code in MyFuncs.nsh can leverage this information to compile with Unicode support. In the second case, NSIS_UNICODE is not defined and code in MyFuncs.nsh may compile for ANSI, and that may cause breakages on systems with certain characteristics (funky characters in user names is a common one). In both cases the developer probably expects their installer is using Unicode routines throughout.

I think from the perspective of backwards compatibility and expected behavior, the compiler should throw an !error if at any point a) the script has already referenced the definition NSIS_UNICODE in any way (dereference, !if, !ifdef, !ifndef, etc.) and b) the Unicode compiler instruction is seen later. The error could read:

"Error: Unicode mode change after NSIS_UNICODE definition referenced. Please move the Unicode statement earlier in your script."

I.e. enforce that Unicode true must always come first. It is the only way to ensure that very hard to trace bugs in the field are not introduced by this change. For headers that implement both Unicode and ANSI functions using NSIS_UNICODE for conditional compilation, this might mean that only Unicode functions are accessible in Unicode NSIS, but that is the same today where you must use either the Unicode or ANSI compiler and you would get the corresponding functions. It's also the behavior developers will expect.

I will literally cry if I have to debug this in the field. Please don't make me cry, it'll ruin my manly image ;)

Discussion

  • Anders

    Anders - 2014-09-08

    At least one of the standard headers already uses a check to make sure things are as expected. You can make your own check, it should look something like this:

    !ifndef MYHEADER_NSH
    !define MYHEADER_NSH ${NSIS_CHAR_SIZE}
    
    !macro MyMacro
    !if ${NSIS_CHAR_SIZE} != ${MYHEADER_NSH}
    !error ...
    !endif
    ...
    !macroend
    
    !endif ;~ MYHEADER_NSH
    
     

    Last edit: Anders 2014-09-08
  • Alastair

    Alastair - 2014-09-08

    Hi Anders,

    Firstly, thanks for your work merging the Unicode code back into mainline NSIS.

    With regard to this bug, I'm aware that I can perform a check to make sure things are as expected, but I don't want to have to. The Unicode setting shouldn't be allowed to change in the middle of a script.

    Think about it: Because I am aware of the problem, I can make my own libraries more robust if I know with certainty how they will be used by others, and also how all of the components my own script uses behave. But I don't know either of these things. My QA time just went up for a start because I have to ask a non-developer resource to test the expected consequences of something that might be hard to detect errors in across 20 library functions. Then what about the library functions my script might use? WordFunc, FileFunc, LogicLib, maybe something more obscure that's not in the official distribution, library functions contributed from others in my company that perhaps don't know about the issue or don't understand it as well as I do, how plugins behave, how System plugin might translate things in various cases if I use m/w/t types, etc. Do I have to go and investigate all of these things now? The reason I use them in the first place is to save time because others have already contributed working implementations.

    The fact that one of the standard headers is already doing this suggests to me that it's already been a problem for someone. I can certainly see it being a problem with the dozens of partners who work with my NSIS scripts. To begin, my documentation just became more complex, I may have to debug weird issues in a partners compiled binary at some point, and I can put checks all over my library, but that is very messy for something that I feel should be solved in NSIS.

    Unicode switching isn't allowed in 2.x, and it shouldn't be allowed in 3.x either. It's just asking for trouble. We look to frameworks like NSIS because they provide what we need with better quality, reliability, and speed of development than we can throw together ourselves. My suggestion is to protect the reliability by squashing this before a 3.0 final, so that e.g. I can say to my partners "We like this new version, everybody upgrade!", rather than "If you upgrade, you need to be painfully aware of these pitfalls.".

    So just to re-iterate my suggestion, set a flag the first time something references NSIS_UNICODE, then error if if you see the Unicode command while the flag is set. This should be fairly easy and solves the problem while preserving your new syntax completely, with no additional constraints over those of NSIS 2.x.

     
  • Anders

    Anders - 2014-09-08

    Unicode switching in 2.x is "not allowed" because the only output target supported by the official code was ANSI.

    The only file that cares is Win\COM.nsh because IID_IShellLink is mapped to IID_IShellLinkA or IID_IShellLinkW and the mapping has to stay the same when you actually try to use this IID. A header that exposes GUIDs for general usage by other code is rare and having one that depends on the string type is extremely rare.

    Most code that uses the system plugin should not care when "Unicode true" is used, you either support Unicode or you don't.

    The only case where this is a problem is in a .nsh with:

    !ifdef NSIS_UNICODE
    !define parameter 666
    !else
    !define parameter 667
    !endif
    !macro bar
    System::Call 'dll::func(i ${parameter})'
    !macroend
    

    ...and nobody is going to write code like that. Just moving the !ifdef NSIS_UNICODE check inside the macro would fix it.

    As soon as a plugin is called or installer data has been added then the Unicode mode is locked and cannot be changed...

    Do you actually have a example of code in the wild that breaks because of this?

     

    Last edit: Anders 2014-09-08
  • Alastair

    Alastair - 2014-09-08

    Or, for example,

    !ifdef NSIS_UNICODE
    !macro MyFunc
      ; Unicode stuff
    !macroend
    !else
    !macro MyFunc
      ; ANSI stuff
    !macroend
    !endif
    

    .. and again, sure, moving the !ifdef inside the macro can fix it. If you know it's even a problem.

    Or, what if we have some code in one of our headers that use the old "Un." definition to write out the install and uninstall version of a function? The !ifdef gets resolved at !include time in that case. E.g.:

    !macro WriteMyFunc DEFUN
    Function ${DEFUN}MyFunc
        !ifdef NSIS_UNICODE
        ; ... do something
        !else
        ; ... do something else
        !endif
    FunctionEnd
    !macroend
    !insertmacro WriteMyFunc ""
    !insertmacro WriteMyFunc "un."
    

    To your points,

    Most code that uses the system plugin should not care when "Unicode true" is used, you either support Unicode or you don't.

    My code supports ANSI and Unicode NSIS and always calls the correct ANSI or Unicode APIs explicitly with the right types. I have some code that deals with memory allocation, too. I could see having ${NSIS_CHAR_SIZE} and the size of type of "t" change between allocation of the buffer and the numerous times it might be used causing problems. Allocation in ANSI and use in Unicode is an overrun scenario.

    As soon as a plugin is called or installer data has been added then the Unicode mode is locked and cannot be changed...

    That doesn't provide any guarantee around the issue.

    Do you actually have a example of code in the wild that breaks because of this?

    I have many thousands of lines of code that I will need to vet against this if it's not fixed. Are you sure all the installers out there use the Unicode command before doing !include's? You're working on an extraordinarily widely deployed platform, do you know how all the code out there is written? I don't see any reason not to fix it and simply ensure the platform is robust and nobody has to be concerned with providing tests/workarounds/warnings.

    Also, just to be clear, I'm not here to have an argument. I merely see something I think will be a risk area going forward and I'm trying to get it addressed. If you still disagree, at least I gave my $.02 :)

    Thanks again.

     
  • Anders

    Anders - 2014-09-08

    Yes, your "macro that creates functions" example can*1 have issues. How hard it is to tackle this in the compiler I don't know. It means tracking at least NSIS_UNICODE
    , NSIS_CHAR_SIZE and NSIS_PTR_SIZE several places in the codebase.

    Allocating buffers of TCHARs is not uncommon but it does not cause issues:

    Section
    IntOp $0 $0 * ${NSIS_CHAR_SIZE} ; (The Unicode mode is actually locked here)
    System::Call 'dll::MemAlloc(ir0)p.r1' ; Unicode mode is locked after this and NSIS_CHAR_SIZE cannot change again
    System::Call 'dll::GetItems(pr1)'
    SectionEnd
    

    Most system plugin code should not check NSIS_UNICODE and then call the A or W version. If the t type is used then the plugin will automatically look for the A or W version of the function for you. You should never mix the t type with a hardcoded A or W suffix...

    *1:

    !macro makefunc
    Function Foo
    !ifdef NSIS_UNICODE
    DetailPrint W
    !else
    DetailPrint A
    !endif
    FunctionEnd
    !macroend
    !insertmacro makefunc
    Unicode true ; Compiler error...
    
     

    Last edit: Anders 2014-09-08
  • Alastair

    Alastair - 2014-09-08

    There is also lots of code that uses System to talk to custom DLLs, some are not designed with A/W. I agree about not mixing T with explicit proc names, that would indeed be bad.

    I have some code that allocs mem in one place and uses it in another, it would be bad if NSIS_CHAR_SIZE changed in between.

    FYI, although I'm not an NSIS admin/project dev here, as background I've been developing with NSIS, mostly professionally, since the very early versions - probably at least 10 years. Very familiar with System.dll/ANSI/Unicode/C++. Similar to you guys, I'm just trying to look out for the platform.

     
    • Anders

      Anders - 2014-09-08

      Exported functions that are not A/W should not be a issue. If it only has one type then you are stuck with m or w and there are no issues. If it depends on the platform then you have to decide at runtime anyway:

      Section
      ${If} ${IsNT}
      System::Call NTDLL::Foo()
      ${Else}
      System::Call Kernel32::Bar()
      ${EndIf}
      SectionEnd
      

      ...even if you decide to optimize it should still not break because the mode should be locked once you start calling things:

      Section
      !ifndef NSIS_UNICODE
      ${If} ${IsNT}
      !endif
      System::Call NTDLL::Foo() ; Mode is locked here so NSIS_UNICODE will not change again
      !ifndef NSIS_UNICODE
      ${Else}
      System::Call Kernel32::Bar()
      ${EndIf}
      !endif
      SectionEnd
      
       

      Last edit: Anders 2014-09-08
  • Anders

    Anders - 2016-12-26
    • status: open --> closed-wont-fix
     

Log in to post a comment.