Menu

#16 Implement faster compression and hashing algorithm (sha512)

open
guy
None
medium
2020-07-30
2020-07-29
dasd
No

I image onsite by bringing equipment onsite or using live cd/usb. Short imaging time is very important.

Current bottlenecks in imaging, using my approach, I believe, (if limited by processor count or slow performance) are compression algorithm (deflate) - speeds are up to 40 MB/s per core - and hashing (unavoidable - implementing aff4 with block hashing would avoid this).

I belive guyimager performance would be greatly improved (especialy on machines with low nr. of cores) if one of faster compression algorithm was used (snappy, lz4, Zstandard).

Also linear hashing using sha512 on 64 bit machine is faster than sha256, so I suggest implementing sha512 (which could be truncated to sha256 length SHA-512/256).

I also second the implementation of aff4 which would kill two birds with one stone.

Discussion

  • guy

    guy - 2020-07-29

    Hello dasd,

    I saw that you posted the same feature request twice, so I permitted myself to delete the previous one.

    Concerning the compression algorithms: For the EWF format, I cannot simply switch to a different algorithm, as no software out there would be able to read such EWF files.

    The same is true for hashing: No algorithm other then those supported by the standard (see Joachim Metz' outstanding documentation) will ever be supported. Yes, it's true that SHA512 is faster, however, we can't take profit of that fact as long as it's not in the specs.

    Please tell me If ever I missed something. I would be happy to include it.

    I agree with you that AFF4 could be a solution. In the last 2 weeks, I did several changes in the source code in order to be able to implement AFF4 at a later stage. This will take some time. For the moment, I can't tell you when it's finished.

    One of my problems is that I have no software that is able to read AFF4 files. So, it's difficult to check wether images generated by Guymager are correct or not. The AFF4 reference implemenetation (on Github) has several problems. I was not yet able to make it run without errors.

    So, when the time has come, I would have to rely on the community (including you, I hope) in order to do the tests. It surely will become a lengthy process...

    Concerning performance of imaging on machines with a few cores only: AFF4 won't perform sooo much better than EWF, as AFF4 especially is faster because of its multithreaded hashing. But if there are not a lot of cores, those threads will run one after another - on the same cores! Yes, in the end, you'll still have a slightly better performance (more efficient compression leads to more CPU time for hashing etc.). However, I think AFF4 mainly will show its advantages on fat machines with many cores.

    For now: There's one option that might help you (at least a bit) when generating EWF files: Switch parameter EwfCompression to empty. Excerpt from guymager.cfg:

    With this setting, Guymager does no compression, except if a block contains zero bytes only. Such blocks are replaced by their compressed equivalent. Optimal settings for slow systems.

    There are 2 ways for enabling that option:
    1) Run Guymager from the command line with
    guymager-pkexec EwfCompression=Empty
    or
    sudo guymager EwfCompression=Empty
    (depending on the system you have it's one or the other)

    2) Create a file /etc/guymager/local.cfg and put (or add) this line into it:
    EwfCompression=Empty
    and start Guymager the usual way.

     
  • dasd

    dasd - 2020-07-29

    Hi Guy,

    tnx for the reply.

    Regarding sha 256/512 if I understan it correctly, EWF officially supports only MD5 and SHA1. Guymager already supports sha256 (ewf-x supports sha256/512 per https://github.com/libyal/libewf/issues/107)

    If Guymager would use sha512 and would truncate the hash to first 40 characters, the result would be the same as using sha256 but the hashing speed would be cca. 50 % faster. This would be especially usefull when verification of image is being done, where decompression is not an issue, and performance is limited by single thread operation.

    Also, is there a tehnical, or other, reason why verification of image could not start some time after image creation begins, if you split image into separate files and if you have enough cores and fast drive, like pcie or ssd).

    Regarding compression algorithm, a faster compression would be great, and you or community, tool developers could develop plugins for their tools to be able to read the custom ewf format. I know X-ways has in the past supported development of plugin for aff4.

    This also answers your question regarding reading AFF4 images. X-ways supports it with appropriate dll, which is availible on the web. And I would gladly support you in testing aff4 images.

     
  • guy

    guy - 2020-07-30

    Guymager supports SHA256, but it won't get written to the resulting EWF file (as it is not supported there). It's contained in the info file only.

    would use sha512 and would truncate the hash to first 40 characters, the result would be the same as using sha256

    I cannot confirm this... while SHA256 and SHA512 have similar algorithms, they produce completely different hashes, I think!? Could you please show me an example (C-Code, Python-Code, Linux shell, ...) that would show me how to use SHA512 for doing a faster SHA256 calculation?

     
  • guy

    guy - 2020-07-30

    Also, is there a tehnical, or other, reason why verification of image could not start some time after image creation begins

    My reasons:

    • It's a question of looking at the image (set of segment files) as a whole. Get one job done and do the other one next. I would not feel good when starting verification on an image that has not been completely written yet.
    • Depending on the target device, this would lead to a lot of disk head movements. That would slow down the write AND the verification process. Yes, you are right for the SSDs you mentionned. However, most users store on conventional HDDs.
    • Data could possibly (even very likely) be read from the cache. I.e., the verification would not check if the data really has been stored correctly on the target device. I had several cases, where the verfication process detected bad HHDs. Bad sectors that did not lead to errors during the write process. The HDDs returned different data during verification process later on.

    I do no think that I'll move away from that concept of totally separated write / verification processes. There's one thing that's far more important than speed: Error free, consistent images.

     
  • dasd

    dasd - 2020-07-30

    Regarding sha256/512 I made a mistake misread it. You could implement truncated sha512 (sha256/512) and that could be verfide by guymager (feature request) and there would be no need to use other tools for verification.

    Thank you for explaining reasons for separating hashing and verification. I understand it completely, but was just thinking creatively.

     

Log in to post a comment.