sleuthkit-users Mailing List for The Sleuth Kit (Page 12)
Brought to you by:
carrier
You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(11) |
Oct
(5) |
Nov
(4) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(1) |
Feb
(20) |
Mar
(60) |
Apr
(40) |
May
(24) |
Jun
(28) |
Jul
(18) |
Aug
(27) |
Sep
(6) |
Oct
(14) |
Nov
(15) |
Dec
(22) |
2004 |
Jan
(34) |
Feb
(13) |
Mar
(28) |
Apr
(23) |
May
(27) |
Jun
(26) |
Jul
(37) |
Aug
(19) |
Sep
(20) |
Oct
(39) |
Nov
(17) |
Dec
(9) |
2005 |
Jan
(45) |
Feb
(43) |
Mar
(66) |
Apr
(36) |
May
(19) |
Jun
(64) |
Jul
(10) |
Aug
(11) |
Sep
(35) |
Oct
(6) |
Nov
(4) |
Dec
(13) |
2006 |
Jan
(52) |
Feb
(34) |
Mar
(39) |
Apr
(39) |
May
(37) |
Jun
(15) |
Jul
(13) |
Aug
(48) |
Sep
(9) |
Oct
(10) |
Nov
(47) |
Dec
(13) |
2007 |
Jan
(25) |
Feb
(4) |
Mar
(2) |
Apr
(29) |
May
(11) |
Jun
(19) |
Jul
(13) |
Aug
(15) |
Sep
(30) |
Oct
(12) |
Nov
(10) |
Dec
(13) |
2008 |
Jan
(2) |
Feb
(54) |
Mar
(58) |
Apr
(43) |
May
(10) |
Jun
(27) |
Jul
(25) |
Aug
(27) |
Sep
(48) |
Oct
(69) |
Nov
(55) |
Dec
(43) |
2009 |
Jan
(26) |
Feb
(36) |
Mar
(28) |
Apr
(27) |
May
(55) |
Jun
(9) |
Jul
(19) |
Aug
(16) |
Sep
(15) |
Oct
(17) |
Nov
(70) |
Dec
(21) |
2010 |
Jan
(56) |
Feb
(59) |
Mar
(53) |
Apr
(32) |
May
(25) |
Jun
(31) |
Jul
(36) |
Aug
(11) |
Sep
(37) |
Oct
(19) |
Nov
(23) |
Dec
(6) |
2011 |
Jan
(21) |
Feb
(20) |
Mar
(30) |
Apr
(30) |
May
(74) |
Jun
(50) |
Jul
(34) |
Aug
(34) |
Sep
(12) |
Oct
(33) |
Nov
(10) |
Dec
(8) |
2012 |
Jan
(23) |
Feb
(57) |
Mar
(26) |
Apr
(14) |
May
(27) |
Jun
(27) |
Jul
(60) |
Aug
(88) |
Sep
(13) |
Oct
(36) |
Nov
(97) |
Dec
(85) |
2013 |
Jan
(60) |
Feb
(24) |
Mar
(43) |
Apr
(32) |
May
(22) |
Jun
(38) |
Jul
(51) |
Aug
(50) |
Sep
(76) |
Oct
(65) |
Nov
(25) |
Dec
(30) |
2014 |
Jan
(19) |
Feb
(41) |
Mar
(43) |
Apr
(28) |
May
(61) |
Jun
(12) |
Jul
(10) |
Aug
(37) |
Sep
(76) |
Oct
(31) |
Nov
(41) |
Dec
(12) |
2015 |
Jan
(33) |
Feb
(28) |
Mar
(53) |
Apr
(22) |
May
(29) |
Jun
(20) |
Jul
(15) |
Aug
(17) |
Sep
(52) |
Oct
(3) |
Nov
(18) |
Dec
(21) |
2016 |
Jan
(20) |
Feb
(8) |
Mar
(21) |
Apr
(7) |
May
(13) |
Jun
(35) |
Jul
(34) |
Aug
(11) |
Sep
(14) |
Oct
(22) |
Nov
(31) |
Dec
(23) |
2017 |
Jan
(20) |
Feb
(7) |
Mar
(5) |
Apr
(6) |
May
(6) |
Jun
(22) |
Jul
(11) |
Aug
(16) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2018 |
Jan
|
Feb
|
Mar
(16) |
Apr
(2) |
May
(6) |
Jun
(5) |
Jul
|
Aug
(2) |
Sep
(4) |
Oct
|
Nov
(16) |
Dec
(13) |
2019 |
Jan
|
Feb
(1) |
Mar
(25) |
Apr
(9) |
May
(2) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
(3) |
Jul
(2) |
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(4) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2022 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2024 |
Jan
|
Feb
(3) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Luís F. N. <lfc...@gm...> - 2016-11-15 01:53:56
|
Hi Brian, I didn't understand exactly how text chunk size will help to index spaces and other chars that breaks words into tokens. You will index text twice? First with default tokenization, breaking words at spaces and similar chars, and second time will index the whole text chunk as one single token? Does the 32KB is the maximum Lucene token size? I think you can do the second indexing (with performance consequences if you index twice, it should be configurable, so users could disable it if they do not need regex or if performance is critical). But I think you should not disable the default indexing (with tokenization), otherwise users will have to always use * as prefix and suffix of their searches, if not they will miss a lot of hits. I do not known if they will be able to do phrase searches, because Lucene does not allow to use * into a phrase search (* between two " "). I do not know about Solr and if it extended that. Regards, Luis Nassif 2016-11-14 20:14 GMT-02:00 Brian Carrier <ca...@sl...>: > Making this a little more specific, we seem to have two options to solve > this problem (which is inherent to Lucene/Solr/Elastic): > > 1) We store text in 32KB chunks (instead of our current 1MB chunks) and > can have the full power of regular expressions. The downside of the > smaller chunks is that there are more boundaries and places where a term > could span the boundary and we could miss a hit if it spans that boundary. > If we needed to, we could do some fancy overlapping. 32KB of text is > about 12 pages of English text (less for non-English). > > 2) We limit the types of regular expressions that people can use and keep > our 1MB chunks. We’ll add some logic into Autopsy to span tokens, but we > won’t be able to support all expressions. For example, if you gave us > “\d\d\d\s\d\d\d\d” we’d turn that into a search for “\d\d\d \d\d\d\d”, but > we wouldn’t able to support a search like “\d\d\d[\s-]\d\d\d\d”. Well we > could in theory, but we dont’ want to add crazy complexity here. > > So, the question is if you’d rather have smaller chunks and the full > breadth of regular expressions or a more limited set of expressions and > bigger chunks. We are looking at the performance differences now, but > wanted to get some initial opinions. > > > > > > On Nov 14, 2016, at 1:09 PM, Brian Carrier <ca...@sl...> > wrote: > > > > Autopsy currently has a limitation when searching for regular > expressions, that spaces are not supported. It’s not a problem for Email > addresses and URLs, but becomes an issue phone numbers, account numbers, > etc. This limitation comes from using an indexed search engine (since > spaces are used to break text into tokens). > > > > We’re looking at ways of solving that and need some guidance. > > > > If you write your own regular expressions, can you please let me know > and share what they look like. We want to know how complex the expressions > are that people use in real life. > > > > Thanks! > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > ------------------------------------------------------------ > ------------------ > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org > |
From: <slo...@gm...> - 2016-11-15 01:39:09
|
I favor complex regex, option 1, enhanced with with Simson’s boundary solution, if possible. From: Simson Garfinkel Sent: Monday, November 14, 2016 2:23 PM To: Brian Carrier Cc: sle...@li... users Subject: Re: [sleuthkit-users] Regular Expressions Brian, With respect to #1 - I solved this problem with bulk_extractor by using an overlapping margin. Extend each block 1K or so into the next block. The extra 1k is called the Margin. Only report hits on string search if the text string beings in the main block, not if it begin in the margin (because then it is included entirely in the next block). You can tune the margin size to describe the largest text object that you wish to find with search. Simson > On Nov 14, 2016, at 5:14 PM, Brian Carrier <ca...@sl...> wrote: > > Making this a little more specific, we seem to have two options to solve this problem (which is inherent to Lucene/Solr/Elastic): > > 1) We store text in 32KB chunks (instead of our current 1MB chunks) and can have the full power of regular expressions. The downside of the smaller chunks is that there are more boundaries and places where a term could span the boundary and we could miss a hit if it spans that boundary. If we needed to, we could do some fancy overlapping. 32KB of text is about 12 pages of English text (less for non-English). > > 2) We limit the types of regular expressions that people can use and keep our 1MB chunks. We’ll add some logic into Autopsy to span tokens, but we won’t be able to support all expressions. For example, if you gave us “\d\d\d\s\d\d\d\d” we’d turn that into a search for “\d\d\d \d\d\d\d”, but we wouldn’t able to support a search like “\d\d\d[\s-]\d\d\d\d”. Well we could in theory, but we dont’ want to add crazy complexity here. > > So, the question is if you’d rather have smaller chunks and the full breadth of regular expressions or a more limited set of expressions and bigger chunks. We are looking at the performance differences now, but wanted to get some initial opinions. > > > > >> On Nov 14, 2016, at 1:09 PM, Brian Carrier <ca...@sl...> wrote: >> >> Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). >> >> We’re looking at ways of solving that and need some guidance. >> >> If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. >> >> Thanks! >> ------------------------------------------------------------------------------ >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org ------------------------------------------------------------------------------ _______________________________________________ sleuthkit-users mailing list https://lists.sourceforge.net/lists/listinfo/sleuthkit-users http://www.sleuthkit.org |
From: Tim <tim...@se...> - 2016-11-14 23:53:15
|
Makes sense. But yeah, if you deal with this situation properly, then you get a lot more flexibility in what block sizes and tokenization approach you use. If you don't deal with this situation, large blocks don't eliminate the problem, they just make missing things less likely. tim On Mon, Nov 14, 2016 at 06:39:59PM -0500, Simson Garfinkel wrote: > Hi Tim, > > Take a look at the bulk_extractor paper, which explains this in detail. There is no need to index block[N-1] below, just block[N] || X bytes from block[N+1], where X is the margin. > > You always need to worry about the margins, because if you don't, you double-report findings. It turns out that there are a lot of optimizations that you can implement if you do things the way I recommend below. For example, you never need to do duplicate suppression if you only index strings that being in block[N], even if they extend into block[N+1]. > > > On Nov 14, 2016, at 6:04 PM, Tim <tim...@se...> wrote: > > > > > > Right. Why not go with 32KB blocks and then index based on overlapped > > windows? To index block[N], you include this string in the index: > > (block[N-1] || block[N] || block[N+1]) > > > > Then when a match occurs, you just add some logic to figure out where > > it actually showed up (only in margin blocks or partially in block[N]) > > > > This is perhaps more naive than what Simson suggests, but with small > > blocks you don't need to worry about having the margins be much > > smaller than the block you're indexing. > > > > tim > > > > PS - I'm probably missing something here. I've been out of the game a > > while. > > > > > > On Mon, Nov 14, 2016 at 05:22:26PM -0500, Simson Garfinkel wrote: > >> Brian, > >> > >> With respect to #1 - I solved this problem with bulk_extractor by using an overlapping margin. Extend each block 1K or so into the next block. The extra 1k is called the Margin. Only report hits on string search if the text string beings in the main block, not if it begin in the margin (because then it is included entirely in the next block). You can tune the margin size to describe the largest text object that you wish to find with search. > >> > >> Simson > >> > >> > >>> On Nov 14, 2016, at 5:14 PM, Brian Carrier <ca...@sl...> wrote: > >>> > >>> Making this a little more specific, we seem to have two options to solve this problem (which is inherent to Lucene/Solr/Elastic): > >>> > >>> 1) We store text in 32KB chunks (instead of our current 1MB chunks) and can have the full power of regular expressions. The downside of the smaller chunks is that there are more boundaries and places where a term could span the boundary and we could miss a hit if it spans that boundary. If we needed to, we could do some fancy overlapping. 32KB of text is about 12 pages of English text (less for non-English). > >>> > >>> 2) We limit the types of regular expressions that people can use and keep our 1MB chunks. We’ll add some logic into Autopsy to span tokens, but we won’t be able to support all expressions. For example, if you gave us “\d\d\d\s\d\d\d\d” we’d turn that into a search for “\d\d\d \d\d\d\d”, but we wouldn’t able to support a search like “\d\d\d[\s-]\d\d\d\d”. Well we could in theory, but we dont’ want to add crazy complexity here. > >>> > >>> So, the question is if you’d rather have smaller chunks and the full breadth of regular expressions or a more limited set of expressions and bigger chunks. We are looking at the performance differences now, but wanted to get some initial opinions. > >>> > >>> > >>> > >>> > >>>> On Nov 14, 2016, at 1:09 PM, Brian Carrier <ca...@sl...> wrote: > >>>> > >>>> Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). > >>>> > >>>> We’re looking at ways of solving that and need some guidance. > >>>> > >>>> If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. > >>>> > >>>> Thanks! > >>>> ------------------------------------------------------------------------------ > >>>> _______________________________________________ > >>>> sleuthkit-users mailing list > >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>> http://www.sleuthkit.org > >>> > >>> > >>> ------------------------------------------------------------------------------ > >>> _______________________________________________ > >>> sleuthkit-users mailing list > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>> http://www.sleuthkit.org > >> > >> > >> ------------------------------------------------------------------------------ > >> _______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > |
From: Simson G. <si...@ac...> - 2016-11-14 23:40:07
|
Hi Tim, Take a look at the bulk_extractor paper, which explains this in detail. There is no need to index block[N-1] below, just block[N] || X bytes from block[N+1], where X is the margin. You always need to worry about the margins, because if you don't, you double-report findings. It turns out that there are a lot of optimizations that you can implement if you do things the way I recommend below. For example, you never need to do duplicate suppression if you only index strings that being in block[N], even if they extend into block[N+1]. > On Nov 14, 2016, at 6:04 PM, Tim <tim...@se...> wrote: > > > Right. Why not go with 32KB blocks and then index based on overlapped > windows? To index block[N], you include this string in the index: > (block[N-1] || block[N] || block[N+1]) > > Then when a match occurs, you just add some logic to figure out where > it actually showed up (only in margin blocks or partially in block[N]) > > This is perhaps more naive than what Simson suggests, but with small > blocks you don't need to worry about having the margins be much > smaller than the block you're indexing. > > tim > > PS - I'm probably missing something here. I've been out of the game a > while. > > > On Mon, Nov 14, 2016 at 05:22:26PM -0500, Simson Garfinkel wrote: >> Brian, >> >> With respect to #1 - I solved this problem with bulk_extractor by using an overlapping margin. Extend each block 1K or so into the next block. The extra 1k is called the Margin. Only report hits on string search if the text string beings in the main block, not if it begin in the margin (because then it is included entirely in the next block). You can tune the margin size to describe the largest text object that you wish to find with search. >> >> Simson >> >> >>> On Nov 14, 2016, at 5:14 PM, Brian Carrier <ca...@sl...> wrote: >>> >>> Making this a little more specific, we seem to have two options to solve this problem (which is inherent to Lucene/Solr/Elastic): >>> >>> 1) We store text in 32KB chunks (instead of our current 1MB chunks) and can have the full power of regular expressions. The downside of the smaller chunks is that there are more boundaries and places where a term could span the boundary and we could miss a hit if it spans that boundary. If we needed to, we could do some fancy overlapping. 32KB of text is about 12 pages of English text (less for non-English). >>> >>> 2) We limit the types of regular expressions that people can use and keep our 1MB chunks. We’ll add some logic into Autopsy to span tokens, but we won’t be able to support all expressions. For example, if you gave us “\d\d\d\s\d\d\d\d” we’d turn that into a search for “\d\d\d \d\d\d\d”, but we wouldn’t able to support a search like “\d\d\d[\s-]\d\d\d\d”. Well we could in theory, but we dont’ want to add crazy complexity here. >>> >>> So, the question is if you’d rather have smaller chunks and the full breadth of regular expressions or a more limited set of expressions and bigger chunks. We are looking at the performance differences now, but wanted to get some initial opinions. >>> >>> >>> >>> >>>> On Nov 14, 2016, at 1:09 PM, Brian Carrier <ca...@sl...> wrote: >>>> >>>> Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). >>>> >>>> We’re looking at ways of solving that and need some guidance. >>>> >>>> If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. >>>> >>>> Thanks! >>>> ------------------------------------------------------------------------------ >>>> _______________________________________________ >>>> sleuthkit-users mailing list >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>> http://www.sleuthkit.org >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> sleuthkit-users mailing list >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>> http://www.sleuthkit.org >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org |
From: Tim <tim...@se...> - 2016-11-14 23:31:41
|
Right. Why not go with 32KB blocks and then index based on overlapped windows? To index block[N], you include this string in the index: (block[N-1] || block[N] || block[N+1]) Then when a match occurs, you just add some logic to figure out where it actually showed up (only in margin blocks or partially in block[N]) This is perhaps more naive than what Simson suggests, but with small blocks you don't need to worry about having the margins be much smaller than the block you're indexing. tim PS - I'm probably missing something here. I've been out of the game a while. On Mon, Nov 14, 2016 at 05:22:26PM -0500, Simson Garfinkel wrote: > Brian, > > With respect to #1 - I solved this problem with bulk_extractor by using an overlapping margin. Extend each block 1K or so into the next block. The extra 1k is called the Margin. Only report hits on string search if the text string beings in the main block, not if it begin in the margin (because then it is included entirely in the next block). You can tune the margin size to describe the largest text object that you wish to find with search. > > Simson > > > > On Nov 14, 2016, at 5:14 PM, Brian Carrier <ca...@sl...> wrote: > > > > Making this a little more specific, we seem to have two options to solve this problem (which is inherent to Lucene/Solr/Elastic): > > > > 1) We store text in 32KB chunks (instead of our current 1MB chunks) and can have the full power of regular expressions. The downside of the smaller chunks is that there are more boundaries and places where a term could span the boundary and we could miss a hit if it spans that boundary. If we needed to, we could do some fancy overlapping. 32KB of text is about 12 pages of English text (less for non-English). > > > > 2) We limit the types of regular expressions that people can use and keep our 1MB chunks. We’ll add some logic into Autopsy to span tokens, but we won’t be able to support all expressions. For example, if you gave us “\d\d\d\s\d\d\d\d” we’d turn that into a search for “\d\d\d \d\d\d\d”, but we wouldn’t able to support a search like “\d\d\d[\s-]\d\d\d\d”. Well we could in theory, but we dont’ want to add crazy complexity here. > > > > So, the question is if you’d rather have smaller chunks and the full breadth of regular expressions or a more limited set of expressions and bigger chunks. We are looking at the performance differences now, but wanted to get some initial opinions. > > > > > > > > > >> On Nov 14, 2016, at 1:09 PM, Brian Carrier <ca...@sl...> wrote: > >> > >> Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). > >> > >> We’re looking at ways of solving that and need some guidance. > >> > >> If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. > >> > >> Thanks! > >> ------------------------------------------------------------------------------ > >> _______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Simson G. <si...@ac...> - 2016-11-14 22:23:00
|
Brian, With respect to #1 - I solved this problem with bulk_extractor by using an overlapping margin. Extend each block 1K or so into the next block. The extra 1k is called the Margin. Only report hits on string search if the text string beings in the main block, not if it begin in the margin (because then it is included entirely in the next block). You can tune the margin size to describe the largest text object that you wish to find with search. Simson > On Nov 14, 2016, at 5:14 PM, Brian Carrier <ca...@sl...> wrote: > > Making this a little more specific, we seem to have two options to solve this problem (which is inherent to Lucene/Solr/Elastic): > > 1) We store text in 32KB chunks (instead of our current 1MB chunks) and can have the full power of regular expressions. The downside of the smaller chunks is that there are more boundaries and places where a term could span the boundary and we could miss a hit if it spans that boundary. If we needed to, we could do some fancy overlapping. 32KB of text is about 12 pages of English text (less for non-English). > > 2) We limit the types of regular expressions that people can use and keep our 1MB chunks. We’ll add some logic into Autopsy to span tokens, but we won’t be able to support all expressions. For example, if you gave us “\d\d\d\s\d\d\d\d” we’d turn that into a search for “\d\d\d \d\d\d\d”, but we wouldn’t able to support a search like “\d\d\d[\s-]\d\d\d\d”. Well we could in theory, but we dont’ want to add crazy complexity here. > > So, the question is if you’d rather have smaller chunks and the full breadth of regular expressions or a more limited set of expressions and bigger chunks. We are looking at the performance differences now, but wanted to get some initial opinions. > > > > >> On Nov 14, 2016, at 1:09 PM, Brian Carrier <ca...@sl...> wrote: >> >> Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). >> >> We’re looking at ways of solving that and need some guidance. >> >> If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. >> >> Thanks! >> ------------------------------------------------------------------------------ >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Brian C. <ca...@sl...> - 2016-11-14 22:15:07
|
Making this a little more specific, we seem to have two options to solve this problem (which is inherent to Lucene/Solr/Elastic): 1) We store text in 32KB chunks (instead of our current 1MB chunks) and can have the full power of regular expressions. The downside of the smaller chunks is that there are more boundaries and places where a term could span the boundary and we could miss a hit if it spans that boundary. If we needed to, we could do some fancy overlapping. 32KB of text is about 12 pages of English text (less for non-English). 2) We limit the types of regular expressions that people can use and keep our 1MB chunks. We’ll add some logic into Autopsy to span tokens, but we won’t be able to support all expressions. For example, if you gave us “\d\d\d\s\d\d\d\d” we’d turn that into a search for “\d\d\d \d\d\d\d”, but we wouldn’t able to support a search like “\d\d\d[\s-]\d\d\d\d”. Well we could in theory, but we dont’ want to add crazy complexity here. So, the question is if you’d rather have smaller chunks and the full breadth of regular expressions or a more limited set of expressions and bigger chunks. We are looking at the performance differences now, but wanted to get some initial opinions. > On Nov 14, 2016, at 1:09 PM, Brian Carrier <ca...@sl...> wrote: > > Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). > > We’re looking at ways of solving that and need some guidance. > > If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. > > Thanks! > ------------------------------------------------------------------------------ > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Derrick K. <dk...@gm...> - 2016-11-14 18:59:30
|
I tend to go with Zawinski/Lundh's mantra on this one... 'Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.' xD Seriously though, I used to write a lot more regexes especially for things like email addresses, credit cards, and credit card track 2 data but that's all built in to the latest Autopsy! Yay! My only comment is that I tend to gravitate towards Perl-style regex vs. POSIX (ie. "\s" vs. "[:space:]") and am often searching through fixed column formats for stuff. ie. Looking at webserver or system logs where the date would be "Nov\s\s09" or "Nov\s10". If it's anything else like looking for a phone number then I'll tend to do whole word searches from an index (ie. "555-1212") or a "\s?\d{3}-\d{4}" regex to find it. Derrick On Mon, Nov 14, 2016 at 11:09 AM, Brian Carrier <ca...@sl...> wrote: > Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). > > We’re looking at ways of solving that and need some guidance. > > If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. > > Thanks! > ------------------------------------------------------------------------------ > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Brian C. <ca...@sl...> - 2016-11-14 18:09:58
|
Autopsy currently has a limitation when searching for regular expressions, that spaces are not supported. It’s not a problem for Email addresses and URLs, but becomes an issue phone numbers, account numbers, etc. This limitation comes from using an indexed search engine (since spaces are used to break text into tokens). We’re looking at ways of solving that and need some guidance. If you write your own regular expressions, can you please let me know and share what they look like. We want to know how complex the expressions are that people use in real life. Thanks! |
From: maría e. D. <dar...@gm...> - 2016-11-08 00:41:43
|
2 2016-11-07 15:34 GMT-02:00 <sle...@li...>: > Send sleuthkit-users mailing list submissions to > sle...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > or, via email, send a message with subject or body 'help' to > sle...@li... > > You can reach the person managing the list at > sle...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of sleuthkit-users digest..." > > > Today's Topics: > > 1. Re: Views area of Autopsy Question (John Lehr) > 2. Re: Views area of Autopsy Question (Troy Bettencourt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 7 Nov 2016 09:19:24 -0800 > From: John Lehr <slo...@gm...> > Subject: Re: [sleuthkit-users] Views area of Autopsy Question > To: Brian Carrier <ca...@sl...> > Cc: sle...@li... > Message-ID: <135...@gm...> > Content-Type: text/plain; charset=utf-8 > > 2 > > > On Nov 7, 2016, at 8:54 AM, Brian Carrier <ca...@sl...> wrote: > > > > Votes sent to me in private show overwhelming support for 2 trees (one > based on extension and another based on signature). > > > > Now the question is how people want to see the tree based on signature > organized. Please vote. > > > > 1) Nodes for images, videos, executables, documents, etc. Each node > would have one or more MIME types in it. For example, Images would have all > of the JPG, GIF, BMP, etc. MIME types. > > > > 2) Nodes for each MIME type. This would give a full taxonomy of the > system. For example: > > > > application > > + exe > > + msword > > + x-msdownload > > > > audio > > + aiff > > > > image > > + bmp > > + jpeg > > ... > > > > text > > + html > > + plain > > ... > > > > #1 is easier to find general types of data, #2 allows you more fine > grained access. Preference? > > > > > > > > > > > > > > > > > > > >> On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> > wrote: > >> > >> Fair points. > >> > >> Let?s get some votes, otherwise we?ll stay with status quo. There are > three options on the table. > >> > >> 1) We intermix extensions and MIME type in the current views area and > items may come and go from nodes as ingest progresses. > >> > >> 2) We have two trees. One is the current extension-based one is > available immediately. The new one is signature-based on is available > after ingest. Files would be in both of them. > >> > >> 3) We do nothing and the tree stays extension-based. If you care about > getting all pictures (regardless of extension), then use the Image Gallery. > If you want other MIME types, you can use ?File Search By Attribute?. > >> > >> Please send me your votes (or to the list) with 1, 2, or 3 so that we > can get this into the next release. > >> > >> thanks, > >> brian > >> > >> > >> > >> > >> > >>> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: > >>> > >>> The MIME type and possible subcategories (dll, com etc) are indeed > >>> relevant information but relying on them to infer the file type might > be > >>> error prone. > >>> > >>> There are lots and lots of corner cases to take into account: > >>> > >>> * When opening a file, Windows (not sure about MAC OS) merely relies > >>> on the file extension. This has led to several cases in which malware > >>> would not execute if the extension was not the correct one. Most recent > >>> case I stumbled upon was a document file which would execute only if > the > >>> extension was .rtf. Extensions such as .doc, .docx or .docm would not > >>> show the real behaviour of the sample. > >>> * Current trend is to rely on scripts. Locky ransomware has been > >>> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs > >>> and .wsf. It's very hard to programmatically detect the language type > >>> with good accuracy (think about how many javascript dialects are out > >>> there). User would probably see lots of text files and might loose the > >>> relevant ones. > >>> * How to deal with encrypted/packed/obfuscated files? > >>> > >>> It is true that file extensions might be abused to make investigation > >>> harder but in several cases the file extension is a very important > piece > >>> of information. Only point of mine in here is not to treat it as second > >>> class citizen during investigations. > >>> > >>> On 03/11/16 16:11, Brian Carrier wrote: > >>>> Another effort we have underway is to incorporate file type > signatures into the Views area of Autopsy and not rely only on extension. > This is a frequent request. But like many things, it gets complicated and > potentially confusing to the user. > >>>> > >>>> Based on Autopsy?s philosophy of providing data as quickly as > possible, the basic idea is to use a file?s extension if its MIME type is > not yet known. When its MIME type becomes known, then ignore the extension > and rely on the file type. > >>>> > >>>> A couple of things we?d like feedback on: > >>>> > >>>> - When the image is being ingested, we are constantly learning about > file types. If we update the set of files under each type (JPEGs for > example), then it would be frequently changing and this could get confusing > and resource intensive. Would you prefer that it is only updated after > ingest is completed or at some periodic interval (say 5 minutes)? > >>>> > >>>> - We currently break down executables in the tree into .exe, dll, > .com, etc nodes. However, their MIME type is usually the same. Do people > use the detailed breakdown of executables or would it be good enough to > have a single executable node in the tree? How are people using these > nodes? > >>>> > >>>> - We currently have a node in the tree for ?.txt? files. If we put > all files of type ?text/plain? in this node, it would have TONS of files. > It would almost seem to make this node useless and impossible to find stuff > in. Do people ever use this node and, if so, would you like it to stay as > just extension-based? > >>>> > >>>> Put another way, the current tree was easy to implement and > understand when it was only extension-based. It?s not as easy when it is > signature-based and we want to know how much of the current tree to keep. > What types of files do you want to be able to find from the tree? > >>>> > >>>> brian > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------ > ------------------ > >>>> Developer Access Program for Intel Xeon Phi Processors > >>>> Access to Intel Xeon Phi processor-based developer platforms. > >>>> With one year of Intel Parallel Studio XE. > >>>> Training and support from Colfax. > >>>> Order your platform today. http://sdm.link/xeonphi > >>>> _______________________________________________ > >>>> sleuthkit-users mailing list > >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>> http://www.sleuthkit.org > >>> > >>> > >>> ------------------------------------------------------------ > ------------------ > >>> Developer Access Program for Intel Xeon Phi Processors > >>> Access to Intel Xeon Phi processor-based developer platforms. > >>> With one year of Intel Parallel Studio XE. > >>> Training and support from Colfax. > >>> Order your platform today. http://sdm.link/xeonphi > >>> _______________________________________________ > >>> sleuthkit-users mailing list > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>> http://www.sleuthkit.org > >> > >> > >> ------------------------------------------------------------ > ------------------ > >> Developer Access Program for Intel Xeon Phi Processors > >> Access to Intel Xeon Phi processor-based developer platforms. > >> With one year of Intel Parallel Studio XE. > >> Training and support from Colfax. > >> Order your platform today. http://sdm.link/xeonphi > >> _______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > > > > > > ------------------------------------------------------------ > ------------------ > > Developer Access Program for Intel Xeon Phi Processors > > Access to Intel Xeon Phi processor-based developer platforms. > > With one year of Intel Parallel Studio XE. > > Training and support from Colfax. > > Order your platform today. http://sdm.link/xeonphi > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > > > ------------------------------ > > Message: 2 > Date: Mon, 7 Nov 2016 17:34:25 +0000 > From: Troy Bettencourt <tro...@gm...> > Subject: Re: [sleuthkit-users] Views area of Autopsy Question > To: John Lehr <slo...@gm...> > Cc: Brian Carrier <ca...@sl...>, > sle...@li... > Message-ID: > <CAOsgDY0NBnt2Erh_QVKTPoOm9Ez4qe8dt-coHoci_dejePuM= > A...@ma...> > Content-Type: text/plain; charset="utf-8" > > 2 please. > > On Nov 7, 2016 12:30 PM, "John Lehr" <slo...@gm...> wrote: > > > 2 > > > > > On Nov 7, 2016, at 8:54 AM, Brian Carrier <ca...@sl...> > wrote: > > > > > > Votes sent to me in private show overwhelming support for 2 trees (one > > based on extension and another based on signature). > > > > > > Now the question is how people want to see the tree based on signature > > organized. Please vote. > > > > > > 1) Nodes for images, videos, executables, documents, etc. Each node > > would have one or more MIME types in it. For example, Images would have > all > > of the JPG, GIF, BMP, etc. MIME types. > > > > > > 2) Nodes for each MIME type. This would give a full taxonomy of the > > system. For example: > > > > > > application > > > + exe > > > + msword > > > + x-msdownload > > > > > > audio > > > + aiff > > > > > > image > > > + bmp > > > + jpeg > > > ... > > > > > > text > > > + html > > > + plain > > > ... > > > > > > #1 is easier to find general types of data, #2 allows you more fine > > grained access. Preference? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> > > wrote: > > >> > > >> Fair points. > > >> > > >> Let?s get some votes, otherwise we?ll stay with status quo. There are > > three options on the table. > > >> > > >> 1) We intermix extensions and MIME type in the current views area and > > items may come and go from nodes as ingest progresses. > > >> > > >> 2) We have two trees. One is the current extension-based one is > > available immediately. The new one is signature-based on is available > > after ingest. Files would be in both of them. > > >> > > >> 3) We do nothing and the tree stays extension-based. If you care > about > > getting all pictures (regardless of extension), then use the Image > Gallery. > > If you want other MIME types, you can use ?File Search By Attribute?. > > >> > > >> Please send me your votes (or to the list) with 1, 2, or 3 so that we > > can get this into the next release. > > >> > > >> thanks, > > >> brian > > >> > > >> > > >> > > >> > > >> > > >>> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: > > >>> > > >>> The MIME type and possible subcategories (dll, com etc) are indeed > > >>> relevant information but relying on them to infer the file type might > > be > > >>> error prone. > > >>> > > >>> There are lots and lots of corner cases to take into account: > > >>> > > >>> * When opening a file, Windows (not sure about MAC OS) merely relies > > >>> on the file extension. This has led to several cases in which malware > > >>> would not execute if the extension was not the correct one. Most > recent > > >>> case I stumbled upon was a document file which would execute only if > > the > > >>> extension was .rtf. Extensions such as .doc, .docx or .docm would not > > >>> show the real behaviour of the sample. > > >>> * Current trend is to rely on scripts. Locky ransomware has been > > >>> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs > > >>> and .wsf. It's very hard to programmatically detect the language type > > >>> with good accuracy (think about how many javascript dialects are out > > >>> there). User would probably see lots of text files and might loose > the > > >>> relevant ones. > > >>> * How to deal with encrypted/packed/obfuscated files? > > >>> > > >>> It is true that file extensions might be abused to make investigation > > >>> harder but in several cases the file extension is a very important > > piece > > >>> of information. Only point of mine in here is not to treat it as > second > > >>> class citizen during investigations. > > >>> > > >>> On 03/11/16 16:11, Brian Carrier wrote: > > >>>> Another effort we have underway is to incorporate file type > > signatures into the Views area of Autopsy and not rely only on extension. > > This is a frequent request. But like many things, it gets complicated and > > potentially confusing to the user. > > >>>> > > >>>> Based on Autopsy?s philosophy of providing data as quickly as > > possible, the basic idea is to use a file?s extension if its MIME type is > > not yet known. When its MIME type becomes known, then ignore the > extension > > and rely on the file type. > > >>>> > > >>>> A couple of things we?d like feedback on: > > >>>> > > >>>> - When the image is being ingested, we are constantly learning about > > file types. If we update the set of files under each type (JPEGs for > > example), then it would be frequently changing and this could get > confusing > > and resource intensive. Would you prefer that it is only updated after > > ingest is completed or at some periodic interval (say 5 minutes)? > > >>>> > > >>>> - We currently break down executables in the tree into .exe, dll, > > .com, etc nodes. However, their MIME type is usually the same. Do > people > > use the detailed breakdown of executables or would it be good enough to > > have a single executable node in the tree? How are people using these > > nodes? > > >>>> > > >>>> - We currently have a node in the tree for ?.txt? files. If we put > > all files of type ?text/plain? in this node, it would have TONS of files. > > It would almost seem to make this node useless and impossible to find > stuff > > in. Do people ever use this node and, if so, would you like it to stay > as > > just extension-based? > > >>>> > > >>>> Put another way, the current tree was easy to implement and > > understand when it was only extension-based. It?s not as easy when it is > > signature-based and we want to know how much of the current tree to keep. > > What types of files do you want to be able to find from the tree? > > >>>> > > >>>> brian > > >>>> > > >>>> > > >>>> > > >>>> ------------------------------------------------------------ > > ------------------ > > >>>> Developer Access Program for Intel Xeon Phi Processors > > >>>> Access to Intel Xeon Phi processor-based developer platforms. > > >>>> With one year of Intel Parallel Studio XE. > > >>>> Training and support from Colfax. > > >>>> Order your platform today. http://sdm.link/xeonphi > > >>>> _______________________________________________ > > >>>> sleuthkit-users mailing list > > >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > >>>> http://www.sleuthkit.org > > >>> > > >>> > > >>> ------------------------------------------------------------ > > ------------------ > > >>> Developer Access Program for Intel Xeon Phi Processors > > >>> Access to Intel Xeon Phi processor-based developer platforms. > > >>> With one year of Intel Parallel Studio XE. > > >>> Training and support from Colfax. > > >>> Order your platform today. http://sdm.link/xeonphi > > >>> _______________________________________________ > > >>> sleuthkit-users mailing list > > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > >>> http://www.sleuthkit.org > > >> > > >> > > >> ------------------------------------------------------------ > > ------------------ > > >> Developer Access Program for Intel Xeon Phi Processors > > >> Access to Intel Xeon Phi processor-based developer platforms. > > >> With one year of Intel Parallel Studio XE. > > >> Training and support from Colfax. > > >> Order your platform today. http://sdm.link/xeonphi > > >> _______________________________________________ > > >> sleuthkit-users mailing list > > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > >> http://www.sleuthkit.org > > > > > > > > > ------------------------------------------------------------ > > ------------------ > > > Developer Access Program for Intel Xeon Phi Processors > > > Access to Intel Xeon Phi processor-based developer platforms. > > > With one year of Intel Parallel Studio XE. > > > Training and support from Colfax. > > > Order your platform today. http://sdm.link/xeonphi > > > _______________________________________________ > > > sleuthkit-users mailing list > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > http://www.sleuthkit.org > > > > > > ------------------------------------------------------------ > > ------------------ > > Developer Access Program for Intel Xeon Phi Processors > > Access to Intel Xeon Phi processor-based developer platforms. > > With one year of Intel Parallel Studio XE. > > Training and support from Colfax. > > Order your platform today. http://sdm.link/xeonphi > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > > ------------------------------ > > _______________________________________________ > sleuthkit-users mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > End of sleuthkit-users Digest, Vol 125, Issue 4 > *********************************************** > -- Prof. Mg. María Elena Darahuge M P Copitec 5100 |
From: Troy B. <tro...@gm...> - 2016-11-07 17:34:34
|
2 please. On Nov 7, 2016 12:30 PM, "John Lehr" <slo...@gm...> wrote: > 2 > > > On Nov 7, 2016, at 8:54 AM, Brian Carrier <ca...@sl...> wrote: > > > > Votes sent to me in private show overwhelming support for 2 trees (one > based on extension and another based on signature). > > > > Now the question is how people want to see the tree based on signature > organized. Please vote. > > > > 1) Nodes for images, videos, executables, documents, etc. Each node > would have one or more MIME types in it. For example, Images would have all > of the JPG, GIF, BMP, etc. MIME types. > > > > 2) Nodes for each MIME type. This would give a full taxonomy of the > system. For example: > > > > application > > + exe > > + msword > > + x-msdownload > > > > audio > > + aiff > > > > image > > + bmp > > + jpeg > > ... > > > > text > > + html > > + plain > > ... > > > > #1 is easier to find general types of data, #2 allows you more fine > grained access. Preference? > > > > > > > > > > > > > > > > > > > >> On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> > wrote: > >> > >> Fair points. > >> > >> Let’s get some votes, otherwise we’ll stay with status quo. There are > three options on the table. > >> > >> 1) We intermix extensions and MIME type in the current views area and > items may come and go from nodes as ingest progresses. > >> > >> 2) We have two trees. One is the current extension-based one is > available immediately. The new one is signature-based on is available > after ingest. Files would be in both of them. > >> > >> 3) We do nothing and the tree stays extension-based. If you care about > getting all pictures (regardless of extension), then use the Image Gallery. > If you want other MIME types, you can use “File Search By Attribute”. > >> > >> Please send me your votes (or to the list) with 1, 2, or 3 so that we > can get this into the next release. > >> > >> thanks, > >> brian > >> > >> > >> > >> > >> > >>> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: > >>> > >>> The MIME type and possible subcategories (dll, com etc) are indeed > >>> relevant information but relying on them to infer the file type might > be > >>> error prone. > >>> > >>> There are lots and lots of corner cases to take into account: > >>> > >>> * When opening a file, Windows (not sure about MAC OS) merely relies > >>> on the file extension. This has led to several cases in which malware > >>> would not execute if the extension was not the correct one. Most recent > >>> case I stumbled upon was a document file which would execute only if > the > >>> extension was .rtf. Extensions such as .doc, .docx or .docm would not > >>> show the real behaviour of the sample. > >>> * Current trend is to rely on scripts. Locky ransomware has been > >>> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs > >>> and .wsf. It's very hard to programmatically detect the language type > >>> with good accuracy (think about how many javascript dialects are out > >>> there). User would probably see lots of text files and might loose the > >>> relevant ones. > >>> * How to deal with encrypted/packed/obfuscated files? > >>> > >>> It is true that file extensions might be abused to make investigation > >>> harder but in several cases the file extension is a very important > piece > >>> of information. Only point of mine in here is not to treat it as second > >>> class citizen during investigations. > >>> > >>> On 03/11/16 16:11, Brian Carrier wrote: > >>>> Another effort we have underway is to incorporate file type > signatures into the Views area of Autopsy and not rely only on extension. > This is a frequent request. But like many things, it gets complicated and > potentially confusing to the user. > >>>> > >>>> Based on Autopsy’s philosophy of providing data as quickly as > possible, the basic idea is to use a file’s extension if its MIME type is > not yet known. When its MIME type becomes known, then ignore the extension > and rely on the file type. > >>>> > >>>> A couple of things we’d like feedback on: > >>>> > >>>> - When the image is being ingested, we are constantly learning about > file types. If we update the set of files under each type (JPEGs for > example), then it would be frequently changing and this could get confusing > and resource intensive. Would you prefer that it is only updated after > ingest is completed or at some periodic interval (say 5 minutes)? > >>>> > >>>> - We currently break down executables in the tree into .exe, dll, > .com, etc nodes. However, their MIME type is usually the same. Do people > use the detailed breakdown of executables or would it be good enough to > have a single executable node in the tree? How are people using these > nodes? > >>>> > >>>> - We currently have a node in the tree for “.txt” files. If we put > all files of type “text/plain” in this node, it would have TONS of files. > It would almost seem to make this node useless and impossible to find stuff > in. Do people ever use this node and, if so, would you like it to stay as > just extension-based? > >>>> > >>>> Put another way, the current tree was easy to implement and > understand when it was only extension-based. It’s not as easy when it is > signature-based and we want to know how much of the current tree to keep. > What types of files do you want to be able to find from the tree? > >>>> > >>>> brian > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------ > ------------------ > >>>> Developer Access Program for Intel Xeon Phi Processors > >>>> Access to Intel Xeon Phi processor-based developer platforms. > >>>> With one year of Intel Parallel Studio XE. > >>>> Training and support from Colfax. > >>>> Order your platform today. http://sdm.link/xeonphi > >>>> _______________________________________________ > >>>> sleuthkit-users mailing list > >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>> http://www.sleuthkit.org > >>> > >>> > >>> ------------------------------------------------------------ > ------------------ > >>> Developer Access Program for Intel Xeon Phi Processors > >>> Access to Intel Xeon Phi processor-based developer platforms. > >>> With one year of Intel Parallel Studio XE. > >>> Training and support from Colfax. > >>> Order your platform today. http://sdm.link/xeonphi > >>> _______________________________________________ > >>> sleuthkit-users mailing list > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>> http://www.sleuthkit.org > >> > >> > >> ------------------------------------------------------------ > ------------------ > >> Developer Access Program for Intel Xeon Phi Processors > >> Access to Intel Xeon Phi processor-based developer platforms. > >> With one year of Intel Parallel Studio XE. > >> Training and support from Colfax. > >> Order your platform today. http://sdm.link/xeonphi > >> _______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > > > > > > ------------------------------------------------------------ > ------------------ > > Developer Access Program for Intel Xeon Phi Processors > > Access to Intel Xeon Phi processor-based developer platforms. > > With one year of Intel Parallel Studio XE. > > Training and support from Colfax. > > Order your platform today. http://sdm.link/xeonphi > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org > |
From: John L. <slo...@gm...> - 2016-11-07 17:19:30
|
2 > On Nov 7, 2016, at 8:54 AM, Brian Carrier <ca...@sl...> wrote: > > Votes sent to me in private show overwhelming support for 2 trees (one based on extension and another based on signature). > > Now the question is how people want to see the tree based on signature organized. Please vote. > > 1) Nodes for images, videos, executables, documents, etc. Each node would have one or more MIME types in it. For example, Images would have all of the JPG, GIF, BMP, etc. MIME types. > > 2) Nodes for each MIME type. This would give a full taxonomy of the system. For example: > > application > + exe > + msword > + x-msdownload > > audio > + aiff > > image > + bmp > + jpeg > ... > > text > + html > + plain > ... > > #1 is easier to find general types of data, #2 allows you more fine grained access. Preference? > > > > > > > > > >> On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> wrote: >> >> Fair points. >> >> Let’s get some votes, otherwise we’ll stay with status quo. There are three options on the table. >> >> 1) We intermix extensions and MIME type in the current views area and items may come and go from nodes as ingest progresses. >> >> 2) We have two trees. One is the current extension-based one is available immediately. The new one is signature-based on is available after ingest. Files would be in both of them. >> >> 3) We do nothing and the tree stays extension-based. If you care about getting all pictures (regardless of extension), then use the Image Gallery. If you want other MIME types, you can use “File Search By Attribute”. >> >> Please send me your votes (or to the list) with 1, 2, or 3 so that we can get this into the next release. >> >> thanks, >> brian >> >> >> >> >> >>> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: >>> >>> The MIME type and possible subcategories (dll, com etc) are indeed >>> relevant information but relying on them to infer the file type might be >>> error prone. >>> >>> There are lots and lots of corner cases to take into account: >>> >>> * When opening a file, Windows (not sure about MAC OS) merely relies >>> on the file extension. This has led to several cases in which malware >>> would not execute if the extension was not the correct one. Most recent >>> case I stumbled upon was a document file which would execute only if the >>> extension was .rtf. Extensions such as .doc, .docx or .docm would not >>> show the real behaviour of the sample. >>> * Current trend is to rely on scripts. Locky ransomware has been >>> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs >>> and .wsf. It's very hard to programmatically detect the language type >>> with good accuracy (think about how many javascript dialects are out >>> there). User would probably see lots of text files and might loose the >>> relevant ones. >>> * How to deal with encrypted/packed/obfuscated files? >>> >>> It is true that file extensions might be abused to make investigation >>> harder but in several cases the file extension is a very important piece >>> of information. Only point of mine in here is not to treat it as second >>> class citizen during investigations. >>> >>> On 03/11/16 16:11, Brian Carrier wrote: >>>> Another effort we have underway is to incorporate file type signatures into the Views area of Autopsy and not rely only on extension. This is a frequent request. But like many things, it gets complicated and potentially confusing to the user. >>>> >>>> Based on Autopsy’s philosophy of providing data as quickly as possible, the basic idea is to use a file’s extension if its MIME type is not yet known. When its MIME type becomes known, then ignore the extension and rely on the file type. >>>> >>>> A couple of things we’d like feedback on: >>>> >>>> - When the image is being ingested, we are constantly learning about file types. If we update the set of files under each type (JPEGs for example), then it would be frequently changing and this could get confusing and resource intensive. Would you prefer that it is only updated after ingest is completed or at some periodic interval (say 5 minutes)? >>>> >>>> - We currently break down executables in the tree into .exe, dll, .com, etc nodes. However, their MIME type is usually the same. Do people use the detailed breakdown of executables or would it be good enough to have a single executable node in the tree? How are people using these nodes? >>>> >>>> - We currently have a node in the tree for “.txt” files. If we put all files of type “text/plain” in this node, it would have TONS of files. It would almost seem to make this node useless and impossible to find stuff in. Do people ever use this node and, if so, would you like it to stay as just extension-based? >>>> >>>> Put another way, the current tree was easy to implement and understand when it was only extension-based. It’s not as easy when it is signature-based and we want to know how much of the current tree to keep. What types of files do you want to be able to find from the tree? >>>> >>>> brian >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Developer Access Program for Intel Xeon Phi Processors >>>> Access to Intel Xeon Phi processor-based developer platforms. >>>> With one year of Intel Parallel Studio XE. >>>> Training and support from Colfax. >>>> Order your platform today. http://sdm.link/xeonphi >>>> _______________________________________________ >>>> sleuthkit-users mailing list >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>> http://www.sleuthkit.org >>> >>> >>> ------------------------------------------------------------------------------ >>> Developer Access Program for Intel Xeon Phi Processors >>> Access to Intel Xeon Phi processor-based developer platforms. >>> With one year of Intel Parallel Studio XE. >>> Training and support from Colfax. >>> Order your platform today. http://sdm.link/xeonphi >>> _______________________________________________ >>> sleuthkit-users mailing list >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>> http://www.sleuthkit.org >> >> >> ------------------------------------------------------------------------------ >> Developer Access Program for Intel Xeon Phi Processors >> Access to Intel Xeon Phi processor-based developer platforms. >> With one year of Intel Parallel Studio XE. >> Training and support from Colfax. >> Order your platform today. http://sdm.link/xeonphi >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Umit K. <umi...@gm...> - 2016-11-07 17:19:01
|
More granular is the better. I vote for 2. Thanks, *Umit Karabiyik* On Mon, Nov 7, 2016 at 10:54 AM, Brian Carrier <ca...@sl...> wrote: > Votes sent to me in private show overwhelming support for 2 trees (one > based on extension and another based on signature). > > Now the question is how people want to see the tree based on signature > organized. Please vote. > > 1) Nodes for images, videos, executables, documents, etc. Each node would > have one or more MIME types in it. For example, Images would have all of > the JPG, GIF, BMP, etc. MIME types. > > 2) Nodes for each MIME type. This would give a full taxonomy of the > system. For example: > > application > + exe > + msword > + x-msdownload > > audio > + aiff > > image > + bmp > + jpeg > ... > > text > + html > + plain > ... > > #1 is easier to find general types of data, #2 allows you more fine > grained access. Preference? > > > > > > > > > > > On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> > wrote: > > > > Fair points. > > > > Let’s get some votes, otherwise we’ll stay with status quo. There are > three options on the table. > > > > 1) We intermix extensions and MIME type in the current views area and > items may come and go from nodes as ingest progresses. > > > > 2) We have two trees. One is the current extension-based one is > available immediately. The new one is signature-based on is available > after ingest. Files would be in both of them. > > > > 3) We do nothing and the tree stays extension-based. If you care about > getting all pictures (regardless of extension), then use the Image Gallery. > If you want other MIME types, you can use “File Search By Attribute”. > > > > Please send me your votes (or to the list) with 1, 2, or 3 so that we > can get this into the next release. > > > > thanks, > > brian > > > > > > > > > > > >> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: > >> > >> The MIME type and possible subcategories (dll, com etc) are indeed > >> relevant information but relying on them to infer the file type might be > >> error prone. > >> > >> There are lots and lots of corner cases to take into account: > >> > >> * When opening a file, Windows (not sure about MAC OS) merely relies > >> on the file extension. This has led to several cases in which malware > >> would not execute if the extension was not the correct one. Most recent > >> case I stumbled upon was a document file which would execute only if the > >> extension was .rtf. Extensions such as .doc, .docx or .docm would not > >> show the real behaviour of the sample. > >> * Current trend is to rely on scripts. Locky ransomware has been > >> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs > >> and .wsf. It's very hard to programmatically detect the language type > >> with good accuracy (think about how many javascript dialects are out > >> there). User would probably see lots of text files and might loose the > >> relevant ones. > >> * How to deal with encrypted/packed/obfuscated files? > >> > >> It is true that file extensions might be abused to make investigation > >> harder but in several cases the file extension is a very important piece > >> of information. Only point of mine in here is not to treat it as second > >> class citizen during investigations. > >> > >> On 03/11/16 16:11, Brian Carrier wrote: > >>> Another effort we have underway is to incorporate file type signatures > into the Views area of Autopsy and not rely only on extension. This is a > frequent request. But like many things, it gets complicated and potentially > confusing to the user. > >>> > >>> Based on Autopsy’s philosophy of providing data as quickly as > possible, the basic idea is to use a file’s extension if its MIME type is > not yet known. When its MIME type becomes known, then ignore the extension > and rely on the file type. > >>> > >>> A couple of things we’d like feedback on: > >>> > >>> - When the image is being ingested, we are constantly learning about > file types. If we update the set of files under each type (JPEGs for > example), then it would be frequently changing and this could get confusing > and resource intensive. Would you prefer that it is only updated after > ingest is completed or at some periodic interval (say 5 minutes)? > >>> > >>> - We currently break down executables in the tree into .exe, dll, > .com, etc nodes. However, their MIME type is usually the same. Do people > use the detailed breakdown of executables or would it be good enough to > have a single executable node in the tree? How are people using these > nodes? > >>> > >>> - We currently have a node in the tree for “.txt” files. If we put > all files of type “text/plain” in this node, it would have TONS of files. > It would almost seem to make this node useless and impossible to find stuff > in. Do people ever use this node and, if so, would you like it to stay as > just extension-based? > >>> > >>> Put another way, the current tree was easy to implement and understand > when it was only extension-based. It’s not as easy when it is > signature-based and we want to know how much of the current tree to keep. > What types of files do you want to be able to find from the tree? > >>> > >>> brian > >>> > >>> > >>> > >>> ------------------------------------------------------------ > ------------------ > >>> Developer Access Program for Intel Xeon Phi Processors > >>> Access to Intel Xeon Phi processor-based developer platforms. > >>> With one year of Intel Parallel Studio XE. > >>> Training and support from Colfax. > >>> Order your platform today. http://sdm.link/xeonphi > >>> _______________________________________________ > >>> sleuthkit-users mailing list > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>> http://www.sleuthkit.org > >> > >> > >> ------------------------------------------------------------ > ------------------ > >> Developer Access Program for Intel Xeon Phi Processors > >> Access to Intel Xeon Phi processor-based developer platforms. > >> With one year of Intel Parallel Studio XE. > >> Training and support from Colfax. > >> Order your platform today. http://sdm.link/xeonphi > >> _______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > > > > > > ------------------------------------------------------------ > ------------------ > > Developer Access Program for Intel Xeon Phi Processors > > Access to Intel Xeon Phi processor-based developer platforms. > > With one year of Intel Parallel Studio XE. > > Training and support from Colfax. > > Order your platform today. http://sdm.link/xeonphi > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org > |
From: Nanni B. <dig...@gm...> - 2016-11-07 17:18:32
|
Number 2. PS: why is it impossible to export the timeline in csv or html/pdf format? Personally I hate the screenshot method ;-) 2016-11-07 18:14 GMT+01:00 Danilo Marques <da...@gm...>: > 2. > > Em 7 de nov de 2016 14:55, "Brian Carrier" <ca...@sl...> > escreveu: > >> Votes sent to me in private show overwhelming support for 2 trees (one >> based on extension and another based on signature). >> >> Now the question is how people want to see the tree based on signature >> organized. Please vote. >> >> 1) Nodes for images, videos, executables, documents, etc. Each node >> would have one or more MIME types in it. For example, Images would have all >> of the JPG, GIF, BMP, etc. MIME types. >> >> 2) Nodes for each MIME type. This would give a full taxonomy of the >> system. For example: >> >> application >> + exe >> + msword >> + x-msdownload >> >> audio >> + aiff >> >> image >> + bmp >> + jpeg >> ... >> >> text >> + html >> + plain >> ... >> >> #1 is easier to find general types of data, #2 allows you more fine >> grained access. Preference? >> >> >> >> >> >> >> >> >> >> > On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> >> wrote: >> > >> > Fair points. >> > >> > Let’s get some votes, otherwise we’ll stay with status quo. There are >> three options on the table. >> > >> > 1) We intermix extensions and MIME type in the current views area and >> items may come and go from nodes as ingest progresses. >> > >> > 2) We have two trees. One is the current extension-based one is >> available immediately. The new one is signature-based on is available >> after ingest. Files would be in both of them. >> > >> > 3) We do nothing and the tree stays extension-based. If you care about >> getting all pictures (regardless of extension), then use the Image Gallery. >> If you want other MIME types, you can use “File Search By Attribute”. >> > >> > Please send me your votes (or to the list) with 1, 2, or 3 so that we >> can get this into the next release. >> > >> > thanks, >> > brian >> > >> > >> > >> > >> > >> >> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: >> >> >> >> The MIME type and possible subcategories (dll, com etc) are indeed >> >> relevant information but relying on them to infer the file type might >> be >> >> error prone. >> >> >> >> There are lots and lots of corner cases to take into account: >> >> >> >> * When opening a file, Windows (not sure about MAC OS) merely relies >> >> on the file extension. This has led to several cases in which malware >> >> would not execute if the extension was not the correct one. Most recent >> >> case I stumbled upon was a document file which would execute only if >> the >> >> extension was .rtf. Extensions such as .doc, .docx or .docm would not >> >> show the real behaviour of the sample. >> >> * Current trend is to rely on scripts. Locky ransomware has been >> >> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs >> >> and .wsf. It's very hard to programmatically detect the language type >> >> with good accuracy (think about how many javascript dialects are out >> >> there). User would probably see lots of text files and might loose the >> >> relevant ones. >> >> * How to deal with encrypted/packed/obfuscated files? >> >> >> >> It is true that file extensions might be abused to make investigation >> >> harder but in several cases the file extension is a very important >> piece >> >> of information. Only point of mine in here is not to treat it as second >> >> class citizen during investigations. >> >> >> >> On 03/11/16 16:11, Brian Carrier wrote: >> >>> Another effort we have underway is to incorporate file type >> signatures into the Views area of Autopsy and not rely only on extension. >> This is a frequent request. But like many things, it gets complicated and >> potentially confusing to the user. >> >>> >> >>> Based on Autopsy’s philosophy of providing data as quickly as >> possible, the basic idea is to use a file’s extension if its MIME type is >> not yet known. When its MIME type becomes known, then ignore the extension >> and rely on the file type. >> >>> >> >>> A couple of things we’d like feedback on: >> >>> >> >>> - When the image is being ingested, we are constantly learning about >> file types. If we update the set of files under each type (JPEGs for >> example), then it would be frequently changing and this could get confusing >> and resource intensive. Would you prefer that it is only updated after >> ingest is completed or at some periodic interval (say 5 minutes)? >> >>> >> >>> - We currently break down executables in the tree into .exe, dll, >> .com, etc nodes. However, their MIME type is usually the same. Do people >> use the detailed breakdown of executables or would it be good enough to >> have a single executable node in the tree? How are people using these >> nodes? >> >>> >> >>> - We currently have a node in the tree for “.txt” files. If we put >> all files of type “text/plain” in this node, it would have TONS of files. >> It would almost seem to make this node useless and impossible to find stuff >> in. Do people ever use this node and, if so, would you like it to stay as >> just extension-based? >> >>> >> >>> Put another way, the current tree was easy to implement and >> understand when it was only extension-based. It’s not as easy when it is >> signature-based and we want to know how much of the current tree to keep. >> What types of files do you want to be able to find from the tree? >> >>> >> >>> brian >> >>> >> >>> >> >>> >> >>> ------------------------------------------------------------ >> ------------------ >> >>> Developer Access Program for Intel Xeon Phi Processors >> >>> Access to Intel Xeon Phi processor-based developer platforms. >> >>> With one year of Intel Parallel Studio XE. >> >>> Training and support from Colfax. >> >>> Order your platform today. http://sdm.link/xeonphi >> >>> _______________________________________________ >> >>> sleuthkit-users mailing list >> >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> >>> http://www.sleuthkit.org >> >> >> >> >> >> ------------------------------------------------------------ >> ------------------ >> >> Developer Access Program for Intel Xeon Phi Processors >> >> Access to Intel Xeon Phi processor-based developer platforms. >> >> With one year of Intel Parallel Studio XE. >> >> Training and support from Colfax. >> >> Order your platform today. http://sdm.link/xeonphi >> >> _______________________________________________ >> >> sleuthkit-users mailing list >> >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> >> http://www.sleuthkit.org >> > >> > >> > ------------------------------------------------------------ >> ------------------ >> > Developer Access Program for Intel Xeon Phi Processors >> > Access to Intel Xeon Phi processor-based developer platforms. >> > With one year of Intel Parallel Studio XE. >> > Training and support from Colfax. >> > Order your platform today. http://sdm.link/xeonphi >> > _______________________________________________ >> > sleuthkit-users mailing list >> > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> > http://www.sleuthkit.org >> >> >> ------------------------------------------------------------ >> ------------------ >> Developer Access Program for Intel Xeon Phi Processors >> Access to Intel Xeon Phi processor-based developer platforms. >> With one year of Intel Parallel Studio XE. >> Training and support from Colfax. >> Order your platform today. http://sdm.link/xeonphi >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org >> > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org > > -- Dott. Nanni Bassetti http://www.nannibassetti.com CAINE project manager - http://www.caine-live.net |
From: Danilo M. <da...@gm...> - 2016-11-07 17:14:49
|
2. Em 7 de nov de 2016 14:55, "Brian Carrier" <ca...@sl...> escreveu: > Votes sent to me in private show overwhelming support for 2 trees (one > based on extension and another based on signature). > > Now the question is how people want to see the tree based on signature > organized. Please vote. > > 1) Nodes for images, videos, executables, documents, etc. Each node would > have one or more MIME types in it. For example, Images would have all of > the JPG, GIF, BMP, etc. MIME types. > > 2) Nodes for each MIME type. This would give a full taxonomy of the > system. For example: > > application > + exe > + msword > + x-msdownload > > audio > + aiff > > image > + bmp > + jpeg > ... > > text > + html > + plain > ... > > #1 is easier to find general types of data, #2 allows you more fine > grained access. Preference? > > > > > > > > > > > On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> > wrote: > > > > Fair points. > > > > Let’s get some votes, otherwise we’ll stay with status quo. There are > three options on the table. > > > > 1) We intermix extensions and MIME type in the current views area and > items may come and go from nodes as ingest progresses. > > > > 2) We have two trees. One is the current extension-based one is > available immediately. The new one is signature-based on is available > after ingest. Files would be in both of them. > > > > 3) We do nothing and the tree stays extension-based. If you care about > getting all pictures (regardless of extension), then use the Image Gallery. > If you want other MIME types, you can use “File Search By Attribute”. > > > > Please send me your votes (or to the list) with 1, 2, or 3 so that we > can get this into the next release. > > > > thanks, > > brian > > > > > > > > > > > >> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: > >> > >> The MIME type and possible subcategories (dll, com etc) are indeed > >> relevant information but relying on them to infer the file type might be > >> error prone. > >> > >> There are lots and lots of corner cases to take into account: > >> > >> * When opening a file, Windows (not sure about MAC OS) merely relies > >> on the file extension. This has led to several cases in which malware > >> would not execute if the extension was not the correct one. Most recent > >> case I stumbled upon was a document file which would execute only if the > >> extension was .rtf. Extensions such as .doc, .docx or .docm would not > >> show the real behaviour of the sample. > >> * Current trend is to rely on scripts. Locky ransomware has been > >> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs > >> and .wsf. It's very hard to programmatically detect the language type > >> with good accuracy (think about how many javascript dialects are out > >> there). User would probably see lots of text files and might loose the > >> relevant ones. > >> * How to deal with encrypted/packed/obfuscated files? > >> > >> It is true that file extensions might be abused to make investigation > >> harder but in several cases the file extension is a very important piece > >> of information. Only point of mine in here is not to treat it as second > >> class citizen during investigations. > >> > >> On 03/11/16 16:11, Brian Carrier wrote: > >>> Another effort we have underway is to incorporate file type signatures > into the Views area of Autopsy and not rely only on extension. This is a > frequent request. But like many things, it gets complicated and potentially > confusing to the user. > >>> > >>> Based on Autopsy’s philosophy of providing data as quickly as > possible, the basic idea is to use a file’s extension if its MIME type is > not yet known. When its MIME type becomes known, then ignore the extension > and rely on the file type. > >>> > >>> A couple of things we’d like feedback on: > >>> > >>> - When the image is being ingested, we are constantly learning about > file types. If we update the set of files under each type (JPEGs for > example), then it would be frequently changing and this could get confusing > and resource intensive. Would you prefer that it is only updated after > ingest is completed or at some periodic interval (say 5 minutes)? > >>> > >>> - We currently break down executables in the tree into .exe, dll, > .com, etc nodes. However, their MIME type is usually the same. Do people > use the detailed breakdown of executables or would it be good enough to > have a single executable node in the tree? How are people using these > nodes? > >>> > >>> - We currently have a node in the tree for “.txt” files. If we put > all files of type “text/plain” in this node, it would have TONS of files. > It would almost seem to make this node useless and impossible to find stuff > in. Do people ever use this node and, if so, would you like it to stay as > just extension-based? > >>> > >>> Put another way, the current tree was easy to implement and understand > when it was only extension-based. It’s not as easy when it is > signature-based and we want to know how much of the current tree to keep. > What types of files do you want to be able to find from the tree? > >>> > >>> brian > >>> > >>> > >>> > >>> ------------------------------------------------------------ > ------------------ > >>> Developer Access Program for Intel Xeon Phi Processors > >>> Access to Intel Xeon Phi processor-based developer platforms. > >>> With one year of Intel Parallel Studio XE. > >>> Training and support from Colfax. > >>> Order your platform today. http://sdm.link/xeonphi > >>> _______________________________________________ > >>> sleuthkit-users mailing list > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>> http://www.sleuthkit.org > >> > >> > >> ------------------------------------------------------------ > ------------------ > >> Developer Access Program for Intel Xeon Phi Processors > >> Access to Intel Xeon Phi processor-based developer platforms. > >> With one year of Intel Parallel Studio XE. > >> Training and support from Colfax. > >> Order your platform today. http://sdm.link/xeonphi > >> _______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > > > > > > ------------------------------------------------------------ > ------------------ > > Developer Access Program for Intel Xeon Phi Processors > > Access to Intel Xeon Phi processor-based developer platforms. > > With one year of Intel Parallel Studio XE. > > Training and support from Colfax. > > Order your platform today. http://sdm.link/xeonphi > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org > |
From: Brian C. <ca...@sl...> - 2016-11-07 16:54:38
|
Votes sent to me in private show overwhelming support for 2 trees (one based on extension and another based on signature). Now the question is how people want to see the tree based on signature organized. Please vote. 1) Nodes for images, videos, executables, documents, etc. Each node would have one or more MIME types in it. For example, Images would have all of the JPG, GIF, BMP, etc. MIME types. 2) Nodes for each MIME type. This would give a full taxonomy of the system. For example: application + exe + msword + x-msdownload audio + aiff image + bmp + jpeg ... text + html + plain ... #1 is easier to find general types of data, #2 allows you more fine grained access. Preference? > On Nov 4, 2016, at 12:08 PM, Brian Carrier <ca...@sl...> wrote: > > Fair points. > > Let’s get some votes, otherwise we’ll stay with status quo. There are three options on the table. > > 1) We intermix extensions and MIME type in the current views area and items may come and go from nodes as ingest progresses. > > 2) We have two trees. One is the current extension-based one is available immediately. The new one is signature-based on is available after ingest. Files would be in both of them. > > 3) We do nothing and the tree stays extension-based. If you care about getting all pictures (regardless of extension), then use the Image Gallery. If you want other MIME types, you can use “File Search By Attribute”. > > Please send me your votes (or to the list) with 1, 2, or 3 so that we can get this into the next release. > > thanks, > brian > > > > > >> On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: >> >> The MIME type and possible subcategories (dll, com etc) are indeed >> relevant information but relying on them to infer the file type might be >> error prone. >> >> There are lots and lots of corner cases to take into account: >> >> * When opening a file, Windows (not sure about MAC OS) merely relies >> on the file extension. This has led to several cases in which malware >> would not execute if the extension was not the correct one. Most recent >> case I stumbled upon was a document file which would execute only if the >> extension was .rtf. Extensions such as .doc, .docx or .docm would not >> show the real behaviour of the sample. >> * Current trend is to rely on scripts. Locky ransomware has been >> spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs >> and .wsf. It's very hard to programmatically detect the language type >> with good accuracy (think about how many javascript dialects are out >> there). User would probably see lots of text files and might loose the >> relevant ones. >> * How to deal with encrypted/packed/obfuscated files? >> >> It is true that file extensions might be abused to make investigation >> harder but in several cases the file extension is a very important piece >> of information. Only point of mine in here is not to treat it as second >> class citizen during investigations. >> >> On 03/11/16 16:11, Brian Carrier wrote: >>> Another effort we have underway is to incorporate file type signatures into the Views area of Autopsy and not rely only on extension. This is a frequent request. But like many things, it gets complicated and potentially confusing to the user. >>> >>> Based on Autopsy’s philosophy of providing data as quickly as possible, the basic idea is to use a file’s extension if its MIME type is not yet known. When its MIME type becomes known, then ignore the extension and rely on the file type. >>> >>> A couple of things we’d like feedback on: >>> >>> - When the image is being ingested, we are constantly learning about file types. If we update the set of files under each type (JPEGs for example), then it would be frequently changing and this could get confusing and resource intensive. Would you prefer that it is only updated after ingest is completed or at some periodic interval (say 5 minutes)? >>> >>> - We currently break down executables in the tree into .exe, dll, .com, etc nodes. However, their MIME type is usually the same. Do people use the detailed breakdown of executables or would it be good enough to have a single executable node in the tree? How are people using these nodes? >>> >>> - We currently have a node in the tree for “.txt” files. If we put all files of type “text/plain” in this node, it would have TONS of files. It would almost seem to make this node useless and impossible to find stuff in. Do people ever use this node and, if so, would you like it to stay as just extension-based? >>> >>> Put another way, the current tree was easy to implement and understand when it was only extension-based. It’s not as easy when it is signature-based and we want to know how much of the current tree to keep. What types of files do you want to be able to find from the tree? >>> >>> brian >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Developer Access Program for Intel Xeon Phi Processors >>> Access to Intel Xeon Phi processor-based developer platforms. >>> With one year of Intel Parallel Studio XE. >>> Training and support from Colfax. >>> Order your platform today. http://sdm.link/xeonphi >>> _______________________________________________ >>> sleuthkit-users mailing list >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>> http://www.sleuthkit.org >> >> >> ------------------------------------------------------------------------------ >> Developer Access Program for Intel Xeon Phi Processors >> Access to Intel Xeon Phi processor-based developer platforms. >> With one year of Intel Parallel Studio XE. >> Training and support from Colfax. >> Order your platform today. http://sdm.link/xeonphi >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Brian C. <ca...@sl...> - 2016-11-04 16:08:39
|
Fair points. Let’s get some votes, otherwise we’ll stay with status quo. There are three options on the table. 1) We intermix extensions and MIME type in the current views area and items may come and go from nodes as ingest progresses. 2) We have two trees. One is the current extension-based one is available immediately. The new one is signature-based on is available after ingest. Files would be in both of them. 3) We do nothing and the tree stays extension-based. If you care about getting all pictures (regardless of extension), then use the Image Gallery. If you want other MIME types, you can use “File Search By Attribute”. Please send me your votes (or to the list) with 1, 2, or 3 so that we can get this into the next release. thanks, brian > On Nov 3, 2016, at 6:04 PM, noxdafox <nox...@gm...> wrote: > > The MIME type and possible subcategories (dll, com etc) are indeed > relevant information but relying on them to infer the file type might be > error prone. > > There are lots and lots of corner cases to take into account: > > * When opening a file, Windows (not sure about MAC OS) merely relies > on the file extension. This has led to several cases in which malware > would not execute if the extension was not the correct one. Most recent > case I stumbled upon was a document file which would execute only if the > extension was .rtf. Extensions such as .doc, .docx or .docm would not > show the real behaviour of the sample. > * Current trend is to rely on scripts. Locky ransomware has been > spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs > and .wsf. It's very hard to programmatically detect the language type > with good accuracy (think about how many javascript dialects are out > there). User would probably see lots of text files and might loose the > relevant ones. > * How to deal with encrypted/packed/obfuscated files? > > It is true that file extensions might be abused to make investigation > harder but in several cases the file extension is a very important piece > of information. Only point of mine in here is not to treat it as second > class citizen during investigations. > > On 03/11/16 16:11, Brian Carrier wrote: >> Another effort we have underway is to incorporate file type signatures into the Views area of Autopsy and not rely only on extension. This is a frequent request. But like many things, it gets complicated and potentially confusing to the user. >> >> Based on Autopsy’s philosophy of providing data as quickly as possible, the basic idea is to use a file’s extension if its MIME type is not yet known. When its MIME type becomes known, then ignore the extension and rely on the file type. >> >> A couple of things we’d like feedback on: >> >> - When the image is being ingested, we are constantly learning about file types. If we update the set of files under each type (JPEGs for example), then it would be frequently changing and this could get confusing and resource intensive. Would you prefer that it is only updated after ingest is completed or at some periodic interval (say 5 minutes)? >> >> - We currently break down executables in the tree into .exe, dll, .com, etc nodes. However, their MIME type is usually the same. Do people use the detailed breakdown of executables or would it be good enough to have a single executable node in the tree? How are people using these nodes? >> >> - We currently have a node in the tree for “.txt” files. If we put all files of type “text/plain” in this node, it would have TONS of files. It would almost seem to make this node useless and impossible to find stuff in. Do people ever use this node and, if so, would you like it to stay as just extension-based? >> >> Put another way, the current tree was easy to implement and understand when it was only extension-based. It’s not as easy when it is signature-based and we want to know how much of the current tree to keep. What types of files do you want to be able to find from the tree? >> >> brian >> >> >> >> ------------------------------------------------------------------------------ >> Developer Access Program for Intel Xeon Phi Processors >> Access to Intel Xeon Phi processor-based developer platforms. >> With one year of Intel Parallel Studio XE. >> Training and support from Colfax. >> Order your platform today. http://sdm.link/xeonphi >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: noxdafox <nox...@gm...> - 2016-11-03 22:04:19
|
The MIME type and possible subcategories (dll, com etc) are indeed relevant information but relying on them to infer the file type might be error prone. There are lots and lots of corner cases to take into account: * When opening a file, Windows (not sure about MAC OS) merely relies on the file extension. This has led to several cases in which malware would not execute if the extension was not the correct one. Most recent case I stumbled upon was a document file which would execute only if the extension was .rtf. Extensions such as .doc, .docx or .docm would not show the real behaviour of the sample. * Current trend is to rely on scripts. Locky ransomware has been spread as .js, (JScript a Microsoft javascript "dialect"), .jse, .vbs and .wsf. It's very hard to programmatically detect the language type with good accuracy (think about how many javascript dialects are out there). User would probably see lots of text files and might loose the relevant ones. * How to deal with encrypted/packed/obfuscated files? It is true that file extensions might be abused to make investigation harder but in several cases the file extension is a very important piece of information. Only point of mine in here is not to treat it as second class citizen during investigations. On 03/11/16 16:11, Brian Carrier wrote: > Another effort we have underway is to incorporate file type signatures into the Views area of Autopsy and not rely only on extension. This is a frequent request. But like many things, it gets complicated and potentially confusing to the user. > > Based on Autopsy’s philosophy of providing data as quickly as possible, the basic idea is to use a file’s extension if its MIME type is not yet known. When its MIME type becomes known, then ignore the extension and rely on the file type. > > A couple of things we’d like feedback on: > > - When the image is being ingested, we are constantly learning about file types. If we update the set of files under each type (JPEGs for example), then it would be frequently changing and this could get confusing and resource intensive. Would you prefer that it is only updated after ingest is completed or at some periodic interval (say 5 minutes)? > > - We currently break down executables in the tree into .exe, dll, .com, etc nodes. However, their MIME type is usually the same. Do people use the detailed breakdown of executables or would it be good enough to have a single executable node in the tree? How are people using these nodes? > > - We currently have a node in the tree for “.txt” files. If we put all files of type “text/plain” in this node, it would have TONS of files. It would almost seem to make this node useless and impossible to find stuff in. Do people ever use this node and, if so, would you like it to stay as just extension-based? > > Put another way, the current tree was easy to implement and understand when it was only extension-based. It’s not as easy when it is signature-based and we want to know how much of the current tree to keep. What types of files do you want to be able to find from the tree? > > brian > > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Stuart M. <st...@ap...> - 2016-11-03 21:08:44
|
Hi Brian, all, not sure if this is relevant, but I am finishing up some work on a Java-based Windows registry hive parser. A test to see if a file F 'is a hive' I added was based on file content, not extension, since hive files don't have any extension. In the general case, what with malicious software renaming files, I would have thought that content-based checks are a must, though I concede they slow things down. Haven't we all renamed a .tgz file to .txt to get it past a mail attachment blocker ;) Stuart |
From: Richard C. <rco...@ba...> - 2016-11-03 20:53:09
|
One clarification: Autopsy currently uses Solr 4, not Solr 5. On Thu, Nov 3, 2016 at 9:35 AM, Brian Carrier <ca...@sl...> wrote: > As mentioned last week at OSDFCon, we are undertaking an effort right now > to reexamine keyword searching in Autopsy. We built it with an old version > of Solr 5 years ago and a lot has changed. One of the things that we are > looking into is if we should change to Elastic. We are making a proof of > concept system that uses it to evaluate its performance and such compared > to the latest Solr. > > We are looking for feedback from people who have a strong opinion about > this. As of right now, it isn’t clear what we gain by moving to Elastic > (and some say we’ll get a performance decrease from it during ingest for > standalone deployments) for the current Autopsy features (text search). > But, there is a theory that if we put more data into the index (times and > other metadata) that other module writers could do some cool stuff with it > (though that data is already in the SQLite database). > > Basic question is, If we simply upgrade to Solr 6 and make some schema > changes to take advantage of new features, who would be sad that we didn’t > jump to Elastic and why? > > brian > > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org > |
From: Brian C. <ca...@sl...> - 2016-11-03 14:11:59
|
Another effort we have underway is to incorporate file type signatures into the Views area of Autopsy and not rely only on extension. This is a frequent request. But like many things, it gets complicated and potentially confusing to the user. Based on Autopsy’s philosophy of providing data as quickly as possible, the basic idea is to use a file’s extension if its MIME type is not yet known. When its MIME type becomes known, then ignore the extension and rely on the file type. A couple of things we’d like feedback on: - When the image is being ingested, we are constantly learning about file types. If we update the set of files under each type (JPEGs for example), then it would be frequently changing and this could get confusing and resource intensive. Would you prefer that it is only updated after ingest is completed or at some periodic interval (say 5 minutes)? - We currently break down executables in the tree into .exe, dll, .com, etc nodes. However, their MIME type is usually the same. Do people use the detailed breakdown of executables or would it be good enough to have a single executable node in the tree? How are people using these nodes? - We currently have a node in the tree for “.txt” files. If we put all files of type “text/plain” in this node, it would have TONS of files. It would almost seem to make this node useless and impossible to find stuff in. Do people ever use this node and, if so, would you like it to stay as just extension-based? Put another way, the current tree was easy to implement and understand when it was only extension-based. It’s not as easy when it is signature-based and we want to know how much of the current tree to keep. What types of files do you want to be able to find from the tree? brian |
From: Brian C. <ca...@sl...> - 2016-11-03 13:35:35
|
As mentioned last week at OSDFCon, we are undertaking an effort right now to reexamine keyword searching in Autopsy. We built it with an old version of Solr 5 years ago and a lot has changed. One of the things that we are looking into is if we should change to Elastic. We are making a proof of concept system that uses it to evaluate its performance and such compared to the latest Solr. We are looking for feedback from people who have a strong opinion about this. As of right now, it isn’t clear what we gain by moving to Elastic (and some say we’ll get a performance decrease from it during ingest for standalone deployments) for the current Autopsy features (text search). But, there is a theory that if we put more data into the index (times and other metadata) that other module writers could do some cool stuff with it (though that data is already in the SQLite database). Basic question is, If we simply upgrade to Solr 6 and make some schema changes to take advantage of new features, who would be sad that we didn’t jump to Elastic and why? brian |
From: Derrick K. <dk...@gm...> - 2016-10-31 17:15:05
|
Right on, thanks for sharing that out! Derrick On Mon, Oct 31, 2016 at 10:38 AM, Stuart Maclean <st...@ap...> wrote: > Hi all, > > I presented some work on efficient whole disk capture and analysis at > the recent OSDFCon. I also managed to bundle the code and some docs and > release up onto github: > > https://github.com/UW-APL-EIS/tupelo > > Please see the 'develop' branch at this point, the master is not > prime-time ready. > > A copy of the osdfcon slides are referenced at the foot of the README.\ > > Comments welcomed. > > Stuart > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Nanni B. <dig...@gm...> - 2016-10-31 17:12:26
|
Hi guys, I'm glad to announce that the new CAINE is out ;) http://www.caine-live.net -- Dott. Nanni Bassetti http://www.nannibassetti.com CAINE project manager - http://www.caine-live.net |
From: Stuart M. <st...@ap...> - 2016-10-31 16:52:09
|
Hi all, I presented some work on efficient whole disk capture and analysis at the recent OSDFCon. I also managed to bundle the code and some docs and release up onto github: https://github.com/UW-APL-EIS/tupelo Please see the 'develop' branch at this point, the master is not prime-time ready. A copy of the osdfcon slides are referenced at the foot of the README.\ Comments welcomed. Stuart |