Although I have only limited understanding on this subject (I have spent the
last several days studying and trying to understand the character set mess),
I don't know that support of UNICODE is enough (at least for what I'm trying
to do).
I know that the ASCII character set maps directly (same bit values) to the
UTF-8 version of UNICODE. However, other languages do not necessarily map
directly to UNICODE. For example, a Win95 drive used in an Arab country
will be using either the MS Win 1256 codepage or the MS DOS 720 codepage.
Under ASCII or UNICODE, filenames using the language specific characters in
those codepages will be garbage.
Again, my knowledge of this subject is limited. I am also still trying to
get my FreeBSD system to recognize althernate character sets when mounting a
foreign disk image. This whole subject is just a big steaming pile of crap.
Unicode is great, but there is a ton of legacy stuff behind it that is still
floating around causing problems.
-Matt
-----Original Message-----
From: Brian Carrier [mailto:ca...@sl...]
Sent: Wednesday, September 22, 2004 2:37 PM
To: Kucenski, Matthew A.
Cc: 'sle...@li...'
Subject: Re: [sleuthkit-users] File/Directories Names using alternate
encoding formats?
On Sep 22, 2004, at 12:41 PM, Kucenski, Matthew A. wrote:
> If I have a drive that was used in a foreign country and the filenames
> are
> encoded using a different codepage, what will Sleuthkit do with those
> file
> names? Should it just display garbage? Has any thought been put into
> allowing the user to change the codepage when running the various
> tools?
> Just looking to find out where the tool stands on this subject.
The current version changes all Unicode characters into ASCII, so you
will lose the special characters. There is a patch for an older
version of TSK at:
http://www.monyo.com/technical/unix/TASK/
My plan is to store both versions of the name in TSK v2 and display
which ever one is specified on the command line. But, that may not be
for a while.
brian
|