#32 Request to add a deterministic behavior to mksquashfs

open
nobody
None
5
2014-02-18
2013-07-05
Vincent G.
No

Hi,

I'm studying, how I could reduce size of update for an embedded system which is using squashfs.

A solution could be to use bsdiff and bspatch which are the best binary diff and patch tools. But, the size of the patch is still to heavy.

So, I'm looking for a solution like this :
old squashfs -> old tar --bspatch--> new tar --> new squashfs

The resulting patch size is really small.

The problem, here, is : squashfs tool does not produce twice the same result, and I need to check the result (with a hash).

So I manage to study why.

Deflate in Zlib is officially deterministic.
liblzma2 (xz) is deterministic over one version.
So the problem is not from these algorithms.

First, I saw mksquashfs is using at different places "time()". Making a new option to force the date everywhere, was very easy.

Then I found this (for the pading buffer) :

           char temp[4096] = {0};

From what I understand, this only assign the first char of the table. The remaining chars are whatever were on the call stack.

So I replaced it with this :
char temp[4096];
memset(&temp, 0, 4096);

Now I have a deterministic mksquashfs when I use "-no-fragments", and a somewhat deterministic mksquashfs when I use "-noF", on a mono-processor architecture.

From what I understand, mksquashfs is using at least 6 threads : reader, writer, progress, deflator, frag_deflator and the main thread.

I printed the fragments data traces : they are changing in a random way, which looks like a undeterministic behavior of threads.

My request is to add a "-deterministic" option which will change the behavior of mksqushfs, in order to be deterministic : mksquashfs would have always the same output for the same directory tree, files and options.

Another program without threads or a compilation option to not use thread could be as good, I think.

Note that the speed is really not a matter with the deterministic behavior.

Thanks, and Regards,

Discussion

  • Vincent G.
    Vincent G.
    2013-07-16

    Hi,

    Here is a patch.

    The patch adds "-deterministic" option.

    Here is what the option does :
    - it forces to one processor
    - it forces the date globaly. The forced date is printed out and must be reused with "-forcedate" option in order to reproduced the same squashfs image.
    - it sorts entries of directory as they are read.
    - it delays the start of the fragment deflator thread. When this late start is active, the queue to fragment deflator thread is using an implementation of the queue with unlimited size.

    Entries of directory must be sorted because the inode numbers are attributed as the files are discovered ("lookup_inode()" in "dirscan1()").

    The format of the date after "-forcedate" is the number of seconds since the Epoch : the output of "date '+%s'".

    With this patch, I can copy a tree of files, I can use tar or I can repack a mounted squashfs, and always have the same suqashfs file.

    Regards,

     
    Attachments