Menu

OsVisNet / News: Recent posts

New files posted

An email XML schema has already been proposed within the XMTP protocol but this is over-simplified for the purposes of this project by just having an single element for each header line.

Another more detailed schema has been derived from the output of the free mbox2XML Thunderbird mailbox archiver. Mbox2xml outputs xml files which are linked to a comprehensive XML stylesheet to view the archived emails and attachments offline. In order to accomplish this extra fields, not required for network visualisation are generated.... read more

Posted by heartbeat348 2010-03-24

New dataset posted

I have prepared what I believe is a new Enron Dataset by extracting the attachments from the .pst mailboxes. First using readpst to change the format of the mails to mailbox format. Then I produced over 300,000 files occupying more than 30Gb by using ripmime to extract the attachments. This program increments the filenames if an identical one already exists in the target directory to prevent overwriting. I then performed md5 hashing on the attachment files to enable accurate identification of files not just based on filename. The resulting tab-delimited text file is 20,407,699 bytes in size and has the format... read more

Posted by heartbeat348 2010-02-28

From acquisiton to visualisation in < 10 mins

Video posted on YouTube (search for OsVisNet) of transformation from a .pst to network relationship diagram in lass than 10 minutes.

A lot has happened since last writing some good, some not so good and some positively awful !

1) Following advice from Simson Garfinkel, after proving concepts with random data, I am now working with real data sets and have chosen the Enron Email Corpus as the first step. This has been put in the public domain since the famous court cases and is available in different formats.... read more

Posted by heartbeat348 2010-02-12

Help Wanted

A great many Open Source tools are already available to assist examiners in the collection of evidence. These provide outputs in many different formats proprietary or otherwise. There are also tools freely available to
• store the data in a single database
• perform Extraction, Transformation and Loading (ETL) on this data
• visualise datasets and perform network analysis on them
The challenge is to bring these outputs together and be able to present them in a uniform visual manner to assist the analyst.
Progress
I have performed an initial evaluation on the suitability of some of these tools to try and establish the best way forward. My findings are
1. The structure of data obtained from forensic collection tools makes it difficult to model in a traditional relational database.
2. XML seems the best way to store and manipulate data during the ETL stage.
3. Input is required from experienced analysts to either evaluate or provide requirements for the visualisation tool
I am now in the process of documenting my findings in more detail and trying to find parties who may be interested in providing assistance to move the project forward. If you are unable to assist yourself I would be most grateful if you let me know of anyone you think may be interested either by forwarding this mail to them or letting me have their contact details.

Posted by heartbeat348 2010-01-22
MongoDB Logo MongoDB