Menu

Filters: are really necessary?

2007-12-14
2013-05-30
  • Nobody/Anonymous

    I tried using the sparc filter (--sparc --lzma) on a 110 MB test.tar file generated from these directories of my system:
    /usr/X11R6/ /usr/bin/ /usr/lib* /usr/sbin/

    The filter improves compression of about 4-5% against the default.

    While the compression improvement is nice, wouldn't filters add to much complexity to the file format, encoder and decoder with only little gain? Wouldn't be better to provide only the lzma filter ("Keep it Simple" principle)? Maybe this is also a not very smart approach when used with tar files that usually contains many file types.

     
    • Lasse Collin

      Lasse Collin - 2007-12-15

      The new file format is definitely much more complex than LZMA_Alone format or the gzip's file format. The original plan about 18-20 months ago was quite simple, but Igor Pavlov and I thought that a few more features would be useful in the long term.

      The new file format has two variants: single-block and multi-block. Single-block variant is quite simple, and can be supported with small amount of code. A decoder that supports only the single-block variant with only LZMA filter won't be a lot bigger than LZMA_Alone format decoder.

      It's the multi-block support that is the biggest single complicating thing. zlib-like stateful API also makes the code a bit bigger; several things would be much more straightforward with callback API. Supporting multiple filters has less effect on complexity and size of the file format handling code in liblzma, so a single-filter format wouldn't have been a lot simpler.

      For tar files, there's a plan to have a special filter that detects file types inside the tar archive, and enables different filters as needed. This could be used by default so users wouldn't need to think about it. But before I will even consider implementing it, I have to get the basic features finished and stable.

       

Log in to post a comment.