Re: [sleuthkit-users] ntfs.c.patch

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello
I am the author of the ntfs.c.patch.

>> Mr.Takahashi wrote the patch which converted file
>> names into Unicode(UTF-8).
>> http://damedame.monyo.com/ntfs.c.patch
>>
>> However, it is not a complete patch.
>> If I use a long file name, Buffer over flow occurs.
>
>This is because the bounds on the 'asc' array are not being checked.  

Yes, I know this patch is incomplete and a quick hack is to extend
the length of "char *asc" 3 times. Also it's easy to add an option
which controls whether UTF-8 feature is enabled or not. I've done on
my private version TASK.

Why I did not create the complete patch is that I do not want to
distribute and maintainance another (Japanese local) version TASK
 and hope that TASK natively support UTF-8. 
Also that what character to use for UTF-8 option and how to keep this
option in TASK is a matter of coding policy.

In Japan, we usually use Japanese filenames for business use and
Windows Japanese version also creates Japanese file names as its
default. So we strongly need to support Japanese filenames.

>>What happens when you run this on the command line?  Does the shell
>>display the Japanese symbols?   For example:
>>
>>	fls -f ntfs img.dd
>
>http://www.port139.co.jp/task/jp.dd (Size:8M NTFS image)
>
>A shell cannot display Japanese character (Unicode).
>Therefore we have to use Autopsy and Web browser.
>A general browser supports Unicode(UTF-8). IE, Netscape,Opera...

Indeed, as Hideaki said, we do not widely use UTF-8 as the encoding
method for Japanese. Most 'Japanese shell' can treat the UTF-8 only as
'8 bit data string' but can display only traditional Japanese encoding
method such as EUC-JP and Shift_JIS.
But using UTF-8 has several merits:

- UTF-8 can support all characters (scripts) which NTFS supports.

- Using UTF-8, the ASCII characters are not converted.

- The code conversion is reversible between UTF-8 and UTF-16(UCS-2).

  In historical reason, the code conversion between Japanese
  traditional encoding method (also character set) and Unicode is not
  reversible and has severe complex problems. (for example there is no
  standard conversion table! / Unicode has much characters than
  Japanese traditional character set.)

  Correctly supporting such traditional Japanese character sets /
  encoding method requires lots of Japanese knowledge and codes.

  Using UTF-8, the code conversion can work automatically and there
  needs small codes for supporting UTF-8.
  And we have lots of code conversion tools for Japanese. Once task
  wrote the file names using UTF-8, we can easily convert them with
  such tools as we like.

The important issue is that currently uni2ascii() function wastes the
code information during code conversion, because this checks the even
byte only and if the even byte is not printable, converts that byte to
'?'.

To use my UTF-8 patch, we can keep code informations completely and
converts my favorite code later with other tools.

-----
TAKAHASHI, Motonobu (monyo)                    mo...@ho...
                                               http://www.monyo.com/