Menu

12 Bit Words Compression

2015-01-06
2015-03-28
  • Stefan Menne

    Stefan Menne - 2015-01-06

    I compress a huge amount of 12-bit values. Currently i put every 2 numbers into 3 bytes and then compress them.

    Can i improve compression ratio? What is the best approach for this? Does LZMA work on byte base? Can i give LZMA a hint for the data format?

     
  • Igor Pavlov

    Igor Pavlov - 2015-01-06

    lzma compresses better when step is 2,4,8 or 16 bytes.
    So lzma is not good for such data (3 bytes step).

    You can try to use delta filter for your 12-bit values before placing numbers into 3 bytes.

     
  • Shell

    Shell - 2015-01-06

    Here are some more approaches you can try:
    1) concatenate all of your data (as in FAT12, this may be very close to what you already do);
    2) pack each value to 2 bytes with zero or other constant padding. then select pb1 and, maybe, lp1 (try if it gives any gain) for LZMA;
    3) pack each 5 values to 8 bytes, then select pb3 and, maybe, lp3;
    4) apply a custom Delta filter to your values (as Igor mentioned), it can be extremely useful if the differences can be encoded as 8-bit values;
    5) treat your data as 12-bit grayscale (or 4-bit RGB) value and apply an image compression algorithm to it.

    Some comments to #5. 12-bit grayscale is better than 4-bit RGB in the sense that most filters operate on RGB components independently. You can try lossless JPEG (especially with arithmetic coding) and PGF first - they may give you a decent compression ratio by themselves. Another option is to use PostScript. Some implementations, e.g. Ghostscript, allow you to apply TIFF or PNG filters without actual compression - you can compress the resulting PS file with 7-Zip. /PixelDifferenceEncode (from TIFF) is essentially a Delta:1.5 filter for 12-bit data. Default PNG filters operate on bytes, but you may implement your own decoding procedures.

     
  • Stefan Menne

    Stefan Menne - 2015-03-28

    Thanks for your replies. I have done severeal tests myself now.
    I discovered that 16 bit values are best compressed in any case. Even if you have 8 bit values LZMA works better if you add 8 bit padding for each item. Regardless which parameters lp, pb, lc you set.
    I never could improve compression ratio changing lp, pb or lc away from their defaults.
    For 4-Bit, 7-Bit, 9-Bit values, etc. also always the best solution is to add padding up to 16 bit.
    Complex solutions like pack 5 12-Bit-values into 8 bytes and then use pb3, lp3 also never improve ratio than simple padding to 16 bit values.
    This holds at least for 2MB packages of data. I splitted my data always to 2MB blocks and then compress it with LZMA. And of course for my type of data. Maybe for bigger blocks or another type of data things change.

     

Log in to post a comment.