Re: [sleuthkit-users] ntfs.c.patch
Brought to you by:
carrier
From: TAKAHASHI M. <mo...@ho...> - 2003-02-24 16:40:23
|
Hello I am the author of the ntfs.c.patch. >> Mr.Takahashi wrote the patch which converted file >> names into Unicode(UTF-8). >> http://damedame.monyo.com/ntfs.c.patch >> >> However, it is not a complete patch. >> If I use a long file name, Buffer over flow occurs. > >This is because the bounds on the 'asc' array are not being checked. Yes, I know this patch is incomplete and a quick hack is to extend the length of "char *asc" 3 times. Also it's easy to add an option which controls whether UTF-8 feature is enabled or not. I've done on my private version TASK. Why I did not create the complete patch is that I do not want to distribute and maintainance another (Japanese local) version TASK and hope that TASK natively support UTF-8. Also that what character to use for UTF-8 option and how to keep this option in TASK is a matter of coding policy. In Japan, we usually use Japanese filenames for business use and Windows Japanese version also creates Japanese file names as its default. So we strongly need to support Japanese filenames. >>What happens when you run this on the command line? Does the shell >>display the Japanese symbols? For example: >> >> fls -f ntfs img.dd > >http://www.port139.co.jp/task/jp.dd (Size:8M NTFS image) > >A shell cannot display Japanese character (Unicode). >Therefore we have to use Autopsy and Web browser. >A general browser supports Unicode(UTF-8). IE, Netscape,Opera... Indeed, as Hideaki said, we do not widely use UTF-8 as the encoding method for Japanese. Most 'Japanese shell' can treat the UTF-8 only as '8 bit data string' but can display only traditional Japanese encoding method such as EUC-JP and Shift_JIS. But using UTF-8 has several merits: - UTF-8 can support all characters (scripts) which NTFS supports. - Using UTF-8, the ASCII characters are not converted. - The code conversion is reversible between UTF-8 and UTF-16(UCS-2). In historical reason, the code conversion between Japanese traditional encoding method (also character set) and Unicode is not reversible and has severe complex problems. (for example there is no standard conversion table! / Unicode has much characters than Japanese traditional character set.) Correctly supporting such traditional Japanese character sets / encoding method requires lots of Japanese knowledge and codes. Using UTF-8, the code conversion can work automatically and there needs small codes for supporting UTF-8. And we have lots of code conversion tools for Japanese. Once task wrote the file names using UTF-8, we can easily convert them with such tools as we like. The important issue is that currently uni2ascii() function wastes the code information during code conversion, because this checks the even byte only and if the even byte is not printable, converts that byte to '?'. To use my UTF-8 patch, we can keep code informations completely and converts my favorite code later with other tools. ----- TAKAHASHI, Motonobu (monyo) mo...@ho... http://www.monyo.com/ |