Thread: [sleuthkit-users] Future of indexing in Autopsy and Sleuthkit
Brought to you by:
carrier
From: Paul B. <ba...@fo...> - 2003-05-21 08:22:44
Attachments:
PGPexch.htm.pgp
|
=20 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, As some people may already know, I am in the process of adding an = Indexed Search feature to Autopsy and Sleuthkit, which are Open Source = filesystem forensic tools. I have some issues that concern these additions and I would like to get = community members' opinions on some of these. So anyone who is using = Autopsy/Sleuthkit or just wants to give his/her opinion: Feel free to = give your opinion and let me know if I should or should not implement = these features/issues. Issue 1: I think it is advisable to limit the indexed character range to only = alphanumeric characters instead of the current limitation of all = printable ASCII characters. The consequences are the following: - POSITIVE: The size of the used index files is smaller (Now it's the = size of the strings file of an image) Which is quite huge if you have = just copied a 80 Gb partition. - NEGATIVE: Indexed Searching on other characters will not be possible = anymore. - POSITIVE: It will be easier to search for substrings of words, which = is not yet possible at the moment. (It is possible in both versions, but = will take a huge extra space if used on the original charachter range) - POSITIVE: Searching will be even quicker. Issue 2: Human readability of the files. A speedup in the indexed searching = process and a redeuction of the size of the used files can be = accomplished by changing the format of the index files. The consequence = is that these cannot be read by a human anymore (No more text-format = file). The consequences are the following: - POSITIVE: Speed of searches is increased - POSITIVE: Size of used files is reduces - NEGATIVE: Files cannot be checked anymore with the human eye. For the moment this are the issues. Maybe more will come.. - -- Paul Bakker Fox-IT Experts in IT Security! Haagweg 137=20 2281 AG RIJSWIJK=20 T 070 336 9999=20 F 070 336 9990=20 I www.fox-it.com=20 E ba...@fo... 57A6 C5EA 55E4 CC1C A967 B13C F8C0 C0FB 8135 E225 Disclaimer: This email may contain confidential information. If this = message is not addressed to you, you may not retain or use the = information in it for any purpose. If you have received it in error, = please notify the sender and delete this message. We try to screen out = viruses but take no responsibility if this email contains a virus. -----BEGIN PGP SIGNATURE----- Version: PGP 7.1.1 iQA/AwUBPss3KvjAwPuBNeIlEQKRXwCg7CS05qSRSxlLxW6Z30wwnj0SQzUAmwbv s4OvNJhBlhByW5cZcx9tyuUq =3D//+o -----END PGP SIGNATURE----- |
From: Simson L. G. <si...@lc...> - 2003-05-22 15:30:28
|
Paul, Here are some issues you may not have considered: > > Issue 1: > I think it is advisable to limit the indexed character range to only=20= > alphanumeric characters instead of the current limitation of all=20 > printable ASCII characters. If you limit to printable ASCII characters, there will be problems for=20= people outside the US (or people working with data outside the US). You=20= need to be able to handle roman characters with accents. These are=20 normally represented with high-bits. If the user searches for an e,=20 they probably want to match on =E8 and =E9 and possibly other e's as = well. Then you have the issue of Arabic, Hebrew, and 16-bit characters. At a minimum, I think that you should transparently handle codepages=20 and coerce them into 7-bit ASCII. But ideally you should handle=20 UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic. > > Issue 2: > Human readability of the files. A speedup in the indexed searching=20 > process and a redeuction of the size of the used files can be=20 > accomplished by changing the format of the index files. The=20 > consequence is that these cannot be read by a human anymore (No more=20= > text-format file). The consequences are the following: > - POSITIVE: Speed of searches is increased > - POSITIVE: Size of used files is reduces > - NEGATIVE: Files cannot be checked anymore with the human eye. I do not think that this is important. The index files should be in=20 binary; create a tool to browse or view them. |
From: Matthew M. S. <mm...@ta...> - 2003-05-23 16:18:56
|
Paul Bakker wrote: > >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >Hello, > >As some people may already know, I am in the process of adding an Indexed Search feature to Autopsy and Sleuthkit, which are Open Source filesystem forensic tools. > >I have some issues that concern these additions and I would like to get community members' opinions on some of these. So anyone who is using Autopsy/Sleuthkit or just wants to give his/her opinion: Feel free to give your opinion and let me know if I should or should not implement these features/issues. > >Issue 1: >I think it is advisable to limit the indexed character range to only alphanumeric characters instead of the current limitation of all printable ASCII characters. The consequences are the following: > - POSITIVE: The size of the used index files is smaller (Now it's the size of the strings file of an image) Which is quite huge if you have just copied a 80 Gb partition. > - NEGATIVE: Indexed Searching on other characters will not be possible anymore. > - POSITIVE: It will be easier to search for substrings of words, which is not yet possible at the moment. (It is possible in both versions, but will take a huge extra space if used on the original charachter range) > - POSITIVE: Searching will be even quicker. > > Paul, is it just me, or do I read that as alphanumeric only? I often need to search for instances of email addresses, and while it is not always mandatory, having access to the @ symbol sure does speed the process up. >Issue 2: >Human readability of the files. A speedup in the indexed searching process and a redeuction of the size of the used files can be accomplished by changing the format of the index files. The consequence is that these cannot be read by a human anymore (No more text-format file). The consequences are the following: > - POSITIVE: Speed of searches is increased > - POSITIVE: Size of used files is reduces > - NEGATIVE: Files cannot be checked anymore with the human eye. > >For the moment this are the issues. Maybe more will come.. > > > Not an issue in my opinion, in fact I agree with another post that mentioned making the file layout open, someone here will write a tool to read it. ----------------------------------------------------------------- This list is provided by the SecurityFocus ARIS analyzer service. For more information on this free incident handling, management and tracking system please see: http://aris.securityfocus.com |