[sleuthkit-users] RE: Future of indexing in Autopsy and Sleuthkit
Brought to you by:
carrier
From: Paul B. <ba...@fo...> - 2003-05-23 08:03:40
|
Hi Simson, Thanks for the response > If you limit to printable ASCII characters, there will be=20 > problems for=20 > people outside the US (or people working with data outside=20 > the US). You=20 > need to be able to handle roman characters with accents. These are=20 > normally represented with high-bits. If the user searches for an e,=20 > they probably want to match on =E8 and =E9 and possibly other e's as = well. >=20 > Then you have the issue of Arabic, Hebrew, and 16-bit characters. >=20 > At a minimum, I think that you should transparently handle codepages=20 > and coerce them into 7-bit ASCII. But ideally you should handle=20 > UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic. OK.. The problem with indexed searching is that you have to have a = limited set of characters to search for. Otherwise it's not possible to generate an index file. The size of the index file grows exponentially with the = size of the character set. That said I will possibly add the diacritic ASCII characters, but = Unicode contains way to much characters. Therefore Unicode poses a problem.... If anyone can suggest a fix/solution I would greatly appreciate that! I'm still thinking about a better solution. -- Paul Bakker Fox-IT Experts in IT Security! Haagweg 137=20 2281 AG RIJSWIJK=20 T 070 336 9999=20 F 070 336 9990=20 I www.fox-it.com=20 E ba...@fo... 57A6 C5EA 55E4 CC1C A967 B13C F8C0 C0FB 8135 E225 Disclaimer: This email may contain confidential information. If this = message is not addressed to you, you may not retain or use the = information in it for any purpose. If you have received it in error, = please notify the sender and delete this message. We try to screen out = viruses but take no responsibility if this email contains a virus. |