Thread: [sleuthkit-developers] unicode encoding/weird file names?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I was having some issues with non-ASCII characters with FLS results,
and ran across the email copied below my message.  At first, I thought
it was only the Windows Latin MS-added characters, but soon saw it is
anything non ASCII.  I am currently running TSK 2.08.

Does this issue still exist?  is there any current effort in allowing
for wider character support in the TSK libs?

I don't know much about swapping encodings around, but i tried
treating the filename from FS_DENT as a multibyte string and running
it through mbstowcs with no success.  Is that a possible direction for
a workaround?

Thanks much!

email:
If we are considering an NTFS file system, then all names are stored as
UTF-16 Unicode, but TSK takes only the lower byte and turns it into
ASCII.  With FAT, the original 8.3 directory entry has only ASCII and
those can have characters from various code pages (I don't think the
actual page is defined in the file system though).  FAT long file names
are stored in UTF-16 and are Unicode so they can use the Unicode name.
  Therefore, if you have a FAT file system with Arabic then (I think)
the short name will use a code page and the long name will use Unicode.

In either case, TSK may not even show you the non-ASCII name because it
requires the name to be valid ASCII.  This is obviously too restrictive
in light of code pages and such.  Once TSK becomes Unicode-aware then
this will also change.

brian