[sleuthkit-developers] [ sleuthkit-Bugs-3051400 ] Invalid UTF8 sequences as file/dir names.
Brought to you by:
carrier
From: SourceForge.net <no...@so...> - 2010-09-18 13:26:17
|
Bugs item #3051400, was opened at 2010-08-23 05:19 Message generated for change (Comment added) made by carrier You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=477889&aid=3051400&group_id=55685 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: File System Tools Group: None >Status: Closed >Resolution: Wont Fix Priority: 5 Private: No Submitted By: Rob J Meijer (ghede) Assigned to: Nobody/Anonymous (nobody) Summary: Invalid UTF8 sequences as file/dir names. Initial Comment: When processing the hda5 image from the honeynet test images ( http://old.honeynet.org/misc/files/challenge-images.tar ) the directory with inode 32169 when invoked with tsk_fs_dir_open_meta, produces multiple names that contain invalid UTF8 sequences. ---------------------------------------------------------------------- >Comment By: Brian Carrier (carrier) Date: 2010-09-18 08:26 Message: TSK is displaying what is in the image. For example, I get this on my OS X system: fls honeypot.hda5.dd 32169 [...] r/r 32179: Error.gif r/r 32180: Exportova?.gif r/r 32181: Fehler.gif [...] Looking inside of the directory, I see: # icat honeypot.hda5.dd 32169| xxd 0000000: a97d 0000 0c00 0102 2e00 0000 89f5 0000 .}.............. [...] 00000e0: 4572 726f 722e 6769 6600 0000 b47d 0000 Error.gif....}.. 00000f0: 1800 0e01 4578 706f 7274 6f76 61bb 2e67 ....Exportova..g 0000100: 6966 0000 b57d 0000 1400 0a01 4665 686c if...}......Fehl The 0xbb that is causing the problems in the later conversion is in the image. TSK is not adding it. The likely problem is this image isn't storing its names in UTF-8. Ext2 file systems can use any single-byte encoding (I think), but file system doesn't store what encoding it uses. The local system's encoding settings are used. TSK assumes that the file system uses UTF-8 since most Linux systems use that. These images are old and could have been created on a system that didn't use UTF-8. I'm marking this as "Won't Fix" though because the fix would be to add encoding detection into the processing and this would apply only to a limited number of old disk images. All recent images use a standard encoding in them. Thanks for the report! ---------------------------------------------------------------------- Comment By: Rob J Meijer (ghede) Date: 2010-08-25 03:48 Message: Seems like the simplest way to reproduce this is: fls honeypot.hda5.dd 32169|iconv ---------------------------------------------------------------------- Comment By: Rob J Meijer (ghede) Date: 2010-08-25 03:02 Message: I tested this with the 3.1.1 and the 3.1.3 version. ---------------------------------------------------------------------- Comment By: Brian Carrier (carrier) Date: 2010-08-23 08:19 Message: Meaning that they are invalid in the image and TSK simply shows you the invalid sequences or that they are valid in the image and TSK makes them invalid. Is this in 3.1 or the trunk? Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=477889&aid=3051400&group_id=55685 |