Re: [sleuthkit-developers] Signature Detection Ingest Module
Brought to you by:
carrier
From: Luís F. N. <lfc...@gm...> - 2014-04-28 23:18:36
|
Great news, Brian, thank you. I took a look at TikaFileTypeDetector and it is using only the file first 100 bytes for detection. From Tika.detect(byte[]) doc: "For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection." And Tika's default, when reading from a stream, currently is 64KB, so it can correctly detect things like "XML root elements after initial comment and DTDs" (MimeTypes doc) and, IMHO, zip based types (ooxml, odf...), ole2 and the text detection heuristcs would work better. >From my Tika experience, I think it would do better detection using Tika.detec(inputStream, fileName), so Tika will read file bytes as needed and will use the file name for detection refinement. In some cases Tika will spool the entire stream to a temporary file for correct detection, but in the general case will read 64KB. I think reading only 100B, instead of 64KB, do not have significant time difference when reading from a spinning magnetic drive, with high latency times, commonlly used for disk images storage. 2014-04-28 11:01 GMT-03:00 Brian Carrier <ca...@sl...>: > Yea, the 3.1 release (which is the develop branch on github) is using > Tika's file type detection. > > > On Apr 26, 2014, at 7:57 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > Hi all, > > > > As I previously mentioned, I did not see a module like this in Autopsy > 3, but read somewhere it will be in Autopsy 3.1, right? Solr, under the > hoods, uses Tika for this purpose (and the results are great) before > extracting text from files to index. I think explicitly using Tika for > detection would be good, so Autopsy could inform Solr about the detected > file mime type instead of Solr re-detecting all file signatures again. What > do you think about it? > > > > Nassif > > > ------------------------------------------------------------------------------ > > Start Your Social Network Today - Download eXo Platform > > Build your Enterprise Intranet with eXo Platform Software > > Java Based Open Source Intranet - Social, Extensible, Cloud Ready > > Get Started Now And Turn Your Intranet Into A Collaboration Platform > > > http://p.sf.net/sfu/ExoPlatform_______________________________________________ > > sleuthkit-developers mailing list > > sle...@li... > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers > > |