Menu

Best archive compression ratio

mesa mesa
2018-07-23
2019-01-03
  • mesa mesa

    mesa mesa - 2018-07-23

    Hi,

    I've been experimenting with various formatting settings in reiser4...

    My use case is a long term archive hard disk that will rarely have changes and rare accesses; so access/transfer speed and fragmentation is a low priority. Storage density is what I'm mostly after. It is a mixed storage of large media files, text files, binary data and a mixture of everything else in between, compressed and uncompressed, encrypted and unencrypted. The ratio of media data to other data is obviously largely skewed towards being mostly media data files but only due to their relatively large size.

    So far I've found that formatting options of 'tails' vs. 'smart' gives little to no effect on storage space, 'smart' seems to effectively make decisions without forcing 'tails'. The default cluster size is best at 64K over all others, as would be expected. gzip wins for best compression over lzo and zstd (only by a small margin with regard to zstd) with surprisingly little disadvantage on large write transfer speed. Compress modes 'force' and 'conv' were the same or very similar but 'ultim' was very poor in comparison with regard to compression ratio.

    So basically I've found for best storage density, the only setting to change from defaults, so far, is gzip over lzo.

    The more complicated settings of node, key and hash I'm guessing have little to no effect for my tests, or do they? Any suggestions on these, as these are the more in-depth and theoretically complicated aspects of the system, and probably lesser tested as well?.. I'm less familiar with them so was generally looking for advice before I go read to understand at a deeper level.

    Hopefully this is the appropriate forum to ask such questions, if not, apologies and I'll be happy to repost elsewhere.

    Many thanks for your efforts in general.

     
  • Edward Shishkin

    Edward Shishkin - 2018-07-23

    Hello,

    The stable version of Reiser4 includes 2 modules (file plugins) which manage regular files:

    1. unix file plugin:

    Doesn’t use compression. Stores file bodies either as fragments in the tree, or as sets on unformatted blocks (extents) depending on formatting policy (mkfs option “formatting=[tails, extents, smart]”. “tails” means storing file body as a set of fragments (tails) in the tree, “extents” means storing file body as sets of unformatted blocks (extents). “smart” means storing small (<=20K) files as fragments, and large (> 20K) – as extents. The “smart” mode keeps a track of file size and automatically converts tails to extents and back. The "smart" mode is default.

    1. cryptcompress file plugin:

    Stores file bodies only as fragments in the tree. Depending on compression mode (mkfs option “compressMode =(conv, force, ...”) it may, or may not involve compression transform for each individual logical chunk (cluster) of file’s body. Compressed (or not compressed) clusters, in turn, get chopped into fragments and tightly packed to the tree at commit time right before writing to disk. In the “conv” compression mode intelligent switches are going in 2 interfaces (high - “FILE, and low – COMPRESSION_TRANSFORM”): If the first logical cluster (64K by default) becomes uncompressible, then management for that file is passed forever to unix-fle plugin with “formatting=extents” policy. Otherwise, cryptcompress file plugin will keep a track of compressibility of each individual logical cluster. If the compression ratio is bad, then compression will be turned off on the low level (logical cluster simply doesn’t get compressed before chopping up into fragments). After this cryptcompress file plugin will check every 2nd, 4-th, 8-th, etc. clusters. Once it becomes compressible, then compression will be turned on again (on the low level!)

    Comment. The “conv” compression mode is default one. It works very good for a mix of well-compressible and uncompressible files of medium size. An example is a root partition, which contains system files; various development environments (folders with sources and executables), etc. The “conv” compression mode works badly for large media-files, as they
    always have leading well-compressible zeros, while the whole file is bad compressible. Thus, essential overheads specific for cryptcompress file plugin are involved (due to storing a large file in a set of fragments in the tree). In particular, it will spend x2 more memory because of
    managing secondary page cache for uncompressed data, take a long time to remove such file, etc.

    Unfortunately there is no efficient way to automatically recognize large media-files by the file system, so usually I recommend to store such files on separate partitions, for which compression is turned off (mkfs option “create=reg40”).

    NOTE: formatting policy (mkfs option “formatting=xxx”) is meaningful only for unix-file plugin and doesn’t make sense for cryptcompress file plugin. On the other hand, compression mode (mkfs option "CompressMode=xxx") is meaningful only for cryptcompress file plugin and doesn’t make sense for unix-file plugin.

    So, for your case (mostly read-accessible archives) I would recommend “mkfs.reiser4 -o compress=gzip1” (intelligent compression with gzip1 transform), plus, if possible, use a separate partition with turned off compression for large media-files. Other mkfs options don’t worth
    attention.

    Thanks,
    Edward.

     
  • Alfonso , Alias josetes

    Hi a every body. i would like tell us about some projects in relation with this arrangement forum. Hi, again. I would like work with you and others programmers for creating a new Linux SS.00 , based in Raiser4 for files management and vedics maths for the calculations. Please answer if you interested for colaborate in the project.
    Thank so much.

    Alfonso Hernandez

    • 34 678 239 521
     
  • Edward Shishkin

    Edward Shishkin - 2019-01-03

    Sorry, not interesting.
    I also have a lot of ideas, and it it not clear who will implement them.

    Thanks,
    Edward.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.