Menu

iconv() may produce bogus results due to missing BOM

2014-08-08
2014-08-11
  • Christian Franke

    ddru_ntfsbitmap.c and ddru_ntfsfindbad.c init iconv() as follows:

      cd = iconv_open (outtype, "UTF-16");
    

    The "UTF-16" type accepts little or big endian UTF-16 but requires that the string begins with a BOM (byte order mark). If the BOM missing, an implementation dependent default is used.

    NTFS file names use LE and no BOM. If ddrutility is build with a libiconv using BE as default, the conversion produces bogus results.

    Fix:

      cd = iconv_open (outtype, "UTF-16LE");
    
     
  • maximus57

    maximus57 - 2014-08-08

    I will look into this. I do not remember seeing in the documentation that I looked at an option for UTF-16LE, but then again I was spending more time working on the memory leak that iconv has then anything else. If all is good I will make this change.

    On a side note, ddru_ntfsfindbad.c does have the option "-e, --encoding <encoding>" to set the encoding manually. Although I guess this would not help with ddru_ntfsbitmap as it does not have that option.

     
  • Christian Franke

    The problem is not visible on most Linux system using GNU libc (https://www.gnu.org/software/libc/. The iconv() implementation in glibc defaults to the byte order of the target machine if "UTF-16" is specified without LE/BE and no BOM is present.

    This violates the following recommendation from the UTF-16 RFC (https://tools.ietf.org/html/rfc2781#section-4.3): If the first two octets of the text is not 0xFE followed by
    0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian.

    This means that the current ddru_ntfs... tools fail on big-endian Linux or on systems using an iconv() implementation which follows above recommendation.

    This is the case for the independent GNU libiconv library (https://www.gnu.org/software/libiconv/) which is used on Cygwin. Cygwin is a platform that IMO should be supported by the ddrutility upstream sources because it is very useful to have these NTFS related tools available on Windows itself.

     

Log in to post a comment.