Re: [sleuthkit-users] Charset encoder error when processing mbox files
Brought to you by:
carrier
From: Derrick K. <dk...@gm...> - 2018-12-11 20:51:33
|
Hello. I'm not sure what else can be done without seeing the data. I don't even think going into Autopsy's "Help -> About -> Activate verbose logging" will help but you can give it a shot. Autopsy uses Tika's CharsetDetector which is straight from ICU4J I believe and this could be a upstream issue in Tika as it seems very specific to your data. I understand about not being able to share your data though! As a thought to isolate this, how about splitting your mbox into a zillion individual mbox's and running Autopsy against the split versions to see if a specific culprit message can found? The procmail package has the 'formail' utility which can do the splitting for you. ie: dk@anubis:/tmp/bleck$ mkdir splitmbox dk@anubis:/tmp/bleck$ cat mbox | formail -ds sh -c 'cat > splitmbox/msg.$FILENO' Derrick On Tue, Dec 11, 2018 at 11:18 AM <hyl...@is...> wrote: > > > Is it possible to share your data at all? > > Unfortunately no as it is proprietary client information. I can share > Autopsy log files, though, or anything of that nature (system log files, > etc.) What can I provide that would be helpful? > > > Derrick > > > > > > On Tue, Dec 11, 2018, 00:21 Joseph Hylkema <hyl...@is... wrote: > > > >> Okay, here's what I did: > >> > >> I changed the contents of /etc/default/locale to remove the hard-coded > >> Italian references in that file, changed the default locale to > >> en_US.UTF-8, and got the same locale output as Derrick did. > >> > >> I then attempted to re-run the ingest... and got the same error. > >> > >> I then upgraded to Autopsy 4.9.1... and got the same error. > >> > >> I then installed Autopsy 4.9.1 in a Mint test VM, spun it up, ran it > >> against the data... and got the same error. > >> > >> I am wondering if maybe I should just punt and install all of the > >> locales? After all, this data has God-only-knows what character > >> encoding in it. > >> > >> So, it's probably not a CAINE issue. It could be an issue with teh > >> data itself. Perhaps I could import it into Thunderbird (read-only and > >> off-network) and see if there is any strange encoding in it. > >> > >> Thoughts? > >> > >> On Mon, 2018-12-10 at 18:57 -0700, Derrick Karpo wrote: > >> > Hi Joseph. > >> > > >> > I've attached my locale output below: > >> > > >> > dk@anubis:~$ locale > >> > LANG=en_CA.utf8 > >> > LANGUAGE=en_CA:en > >> > LC_CTYPE="en_CA.utf8" > >> > LC_NUMERIC="en_CA.utf8" > >> > LC_TIME="en_CA.utf8" > >> > LC_COLLATE="en_CA.utf8" > >> > LC_MONETARY="en_CA.utf8" > >> > LC_MESSAGES="en_CA.utf8" > >> > LC_PAPER="en_CA.utf8" > >> > LC_NAME="en_CA.utf8" > >> > LC_ADDRESS="en_CA.utf8" > >> > LC_TELEPHONE="en_CA.utf8" > >> > LC_MEASUREMENT="en_CA.utf8" > >> > LC_IDENTIFICATION="en_CA.utf8" > >> > LC_ALL= > >> > dk@anubis:~$ locale charmap > >> > UTF-8 > >> > > >> > I tested my system under Autopsy 4.9.0 and 4.9.1 and both ran fine. > >> > While I'm not convinced we are on the right track with the locales > >> > stuff we could try something: > >> > > >> > $ sudo dpkg-reconfigure locales (generate "en_US.UTF-8" and set it > >> > as the default locale) > >> > <log out of the Caine X session> > >> > $ locale (make sure it's all "en_US.utf8") > >> > <test Autopsy again> > >> > > >> > Derrick > >> > > >> > On Sun, Dec 9, 2018 at 10:18 PM <hyl...@is...> wrote: > >> > > > >> > > > Hi Joseph. > >> > > > > >> > > > This question might be better asked directly to Nanni as it > >> > > > sounds > >> > > > like it may be Caine specific! I just tested mbox parsing under > >> > > > Debian testing w/Autopsy 4.9.1 and didn't have any issues with > >> > > > keyword > >> > > > searches. > >> > > > > >> > > > While I don't have a copy of Caine to test with at the moment I > >> > > > wonder > >> > > > if it's a manifestation of your systems locale. If you fire up a > >> > > > terminal emulator, can you send the output from 'locale' and > >> > > > 'locale > >> > > > charmap'? From MboxParser.java:111 in Autopsy it looks like if > >> > > > it > >> > > > can't detect the character encoder that it'll throw that message > >> > > > but I > >> > > > could be way off base here. > >> > > > >> > > Hi Derrick, > >> > > > >> > > Thank you very much for the quick reply. Below is the output of > >> > > 'locale': > >> > > > >> > > jhylkema@caine-vm:~$ locale > >> > > LANG=en_US.UTF-8 > >> > > LANGUAGE=en_US > >> > > LC_CTYPE="en_US.UTF-8" > >> > > LC_NUMERIC=it_IT.UTF-8 > >> > > LC_TIME=it_IT.UTF-8 > >> > > LC_COLLATE="en_US.UTF-8" > >> > > LC_MONETARY=it_IT.UTF-8 > >> > > LC_MESSAGES="en_US.UTF-8" > >> > > LC_PAPER=it_IT.UTF-8 > >> > > LC_NAME=it_IT.UTF-8 > >> > > LC_ADDRESS=it_IT.UTF-8 > >> > > LC_TELEPHONE=it_IT.UTF-8 > >> > > LC_MEASUREMENT=it_IT.UTF-8 > >> > > LC_IDENTIFICATION=it_IT.UTF-8 > >> > > LC_ALL= > >> > > > >> > > And below is the output of 'locale charmap': > >> > > > >> > > jhylkema@caine-vm:~$ locale charmap > >> > > UTF-8 > >> > > > >> > > If I were a betting man, my money would be on the fact that LC_ALL > >> > > isn't > >> > > set. Is that environment variable set in your Debian test distro? > >> > > > >> > > I will also email Nanni. > >> > > > >> > > Thank you. > >> > > > >> > > > > >> > > > Derrick > >> > > > On Sun, Dec 9, 2018 at 1:18 AM Joseph Hylkema < > >> > > > hyl...@is...> > >> > > > wrote: > >> > > > > > >> > > > > Hi all, > >> > > > > > >> > > > > First post to the list. > >> > > > > > >> > > > > I am trying to use Autopsy to run some keyword searches on mbox > >> > > > > files > >> > > > > downloaded from gmail. Unfortunately, autopsy returns an > >> > > > > error: > >> > > > > "Error while processing: Could not find appropriate charset > >> > > > > encoder." > >> > > > > I am running Autopsy on Caine 10 in a KVM VM with 8GB RAM on a > >> > > > > Lenovo > >> > > > > P51 with a Core I7 processor. > >> > > > > > >> > > > > Any help would be appreciated. > >> > > > > > >> > > > > -- > >> > > > > "Far better it is to dare mighty things, to win glorious > >> > > > > triumphs, even > >> > > > > though checkered by failure, than to take rank with those poor > >> > > > > spirits > >> > > > > who neither enjoy much nor suffer much, because they live in > >> > > > > the gray > >> > > > > twilight that knows neither victory nor defeat." > >> > > > > > >> > > > > -- Theodore Roosevelt, "The Strenuous Life." > >> > > > > > >> > > > > > >> > > > > > >> > > > > _______________________________________________ > >> > > > > sleuthkit-users mailing list > >> > > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> > > > > http://www.sleuthkit.org > >> > > > >> > > > >> > >> > > > > |