Thread: [sleuthkit-users] Re: Future of indexing in Autopsy and Sleuthkit

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

If you are feeling ambitious, why not give the option to the user. Take
the suggestions you receive from this list to determine the default
behavior of the application, and then give the user the option of
changing that behavior if desired. In my opinion, one of the largest
benefits to using open source software is its flexibility.

Matt Bergen
Lead Information Security Officer
Wyoming Department of Employment

>>> "Simson L. Garfinkel" <si...@lc...> 05/22/03 09:27AM >>>
Paul,

Here are some issues you may not have considered:
>
> Issue 1:
> I think it is advisable to limit the indexed character range to only

> alphanumeric characters instead of the current limitation of all=20
> printable ASCII characters.

If you limit to printable ASCII characters, there will be problems for

people outside the US (or people working with data outside the US). You

need to be able to handle roman characters with accents. These are=20
normally represented with high-bits. If the user searches for an e,=20
they probably want to match on =E8 and =E9 and possibly other e's as well.

Then you have the issue of Arabic, Hebrew, and 16-bit characters.

At a minimum, I think that you should transparently handle codepages=20
and coerce them into 7-bit ASCII. But ideally you should handle=20
UNICODE, UTF-8, UTF-16, etc. Or do something for Arabic.
>
> Issue 2:
> Human readability of the files. A speedup in the indexed searching=20
> process and a redeuction of the size of the used files can be=20
> accomplished by changing the format of the index files. The=20
> consequence is that these cannot be read by a human anymore (No more

> text-format file). The consequences are the following:
>  - POSITIVE: Speed of searches is increased
>  - POSITIVE: Size of used files is reduces
>  - NEGATIVE: Files cannot be checked anymore with the human eye.

I do not think that this is important. The index files should be in=20
binary; create a tool to browse or view them.

-----------------------------------------------------------------
This list is provided by the SecurityFocus ARIS analyzer service.
For more information on this free incident handling, management
and tracking system please see: http://aris.securityfocus.com=20