ddru_ntfsbitmap.c and ddru_ntfsfindbad.c init iconv() as follows:
cd = iconv_open (outtype, "UTF-16");
The "UTF-16" type accepts little or big endian UTF-16 but requires that the string begins with a BOM (byte order mark). If the BOM missing, an implementation dependent default is used.
NTFS file names use LE and no BOM. If ddrutility is build with a libiconv using BE as default, the conversion produces bogus results.
Fix:
cd = iconv_open (outtype, "UTF-16LE");
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I will look into this. I do not remember seeing in the documentation that I looked at an option for UTF-16LE, but then again I was spending more time working on the memory leak that iconv has then anything else. If all is good I will make this change.
On a side note, ddru_ntfsfindbad.c does have the option "-e, --encoding <encoding>" to set the encoding manually. Although I guess this would not help with ddru_ntfsbitmap as it does not have that option.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The problem is not visible on most Linux system using GNU libc (https://www.gnu.org/software/libc/. The iconv() implementation in glibc defaults to the byte order of the target machine if "UTF-16" is specified without LE/BE and no BOM is present.
This violates the following recommendation from the UTF-16 RFC (https://tools.ietf.org/html/rfc2781#section-4.3): If the first two octets of the text is not 0xFE followed by
0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian.
This means that the current ddru_ntfs... tools fail on big-endian Linux or on systems using an iconv() implementation which follows above recommendation.
This is the case for the independent GNU libiconv library (https://www.gnu.org/software/libiconv/) which is used on Cygwin. Cygwin is a platform that IMO should be supported by the ddrutility upstream sources because it is very useful to have these NTFS related tools available on Windows itself.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ddru_ntfsbitmap.c and ddru_ntfsfindbad.c init iconv() as follows:
The "UTF-16" type accepts little or big endian UTF-16 but requires that the string begins with a BOM (byte order mark). If the BOM missing, an implementation dependent default is used.
NTFS file names use LE and no BOM. If ddrutility is build with a libiconv using BE as default, the conversion produces bogus results.
Fix:
I will look into this. I do not remember seeing in the documentation that I looked at an option for UTF-16LE, but then again I was spending more time working on the memory leak that iconv has then anything else. If all is good I will make this change.
On a side note, ddru_ntfsfindbad.c does have the option "-e, --encoding <encoding>" to set the encoding manually. Although I guess this would not help with ddru_ntfsbitmap as it does not have that option.
The problem is not visible on most Linux system using GNU libc (https://www.gnu.org/software/libc/. The iconv() implementation in glibc defaults to the byte order of the target machine if "UTF-16" is specified without LE/BE and no BOM is present.
This violates the following recommendation from the UTF-16 RFC (https://tools.ietf.org/html/rfc2781#section-4.3): If the first two octets of the text is not 0xFE followed by
0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian.
This means that the current ddru_ntfs... tools fail on big-endian Linux or on systems using an iconv() implementation which follows above recommendation.
This is the case for the independent GNU libiconv library (https://www.gnu.org/software/libiconv/) which is used on Cygwin. Cygwin is a platform that IMO should be supported by the ddrutility upstream sources because it is very useful to have these NTFS related tools available on Windows itself.