RE: [lc-devel] WKdm Problem

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Nitin,

> One particular thing I'm trying to improve is to minimize
> extra memory it requires during compression. Presently,
> it stores tags and other bits one per byte (in case of
> partial or exact match) and later packs them together.
> I think, this packed output can be directly generated
> without this intermediate step -- this should greatly
> reduce its memory footprint.

I think you're optimizing the wrong thing here.  We chose to defer the
packing of bits in this manner intentionally: it improves the
algorithm's speed substantially (by more than 3x).  In exchange, we did
need temporary space to store the bits, but the arrays required still
leave WK* using less book-keeping space than any other compressor we
encountered.  For example, LZO needs a 64 KB dictionary, while WKdm uses
something like 20 KB.  Moreover, real systems have 10's to 100's of
thousands of pages (e.g. 256 MB, small by current standards, is 65,536
page frames).  Use of 5 (WKdm) to 16 (LZO) pages for temporary data
structures accounts for 0.008% to 0.02% of total main memory consumption
respectively.

Shrinking the footprint will yield no performance improvement -- the
noise in measuring running time will be much larger than the performance
gained by increasing main memory capacity by 0.02% (in the best case!).
Meanwhile, you will have slowed compression substantially, increasing
the cost of keeping pages compressed and thus limiting the ideal
compressed cache size.  You will have lost much more in overall
performance than you gained.

If you want to improve the WK* algorithms, try:

  (a) A 32- or 64-word dictionary
  (b) Varying the associativity (not just direct-mapped and 4-way set
associative)
  (c) Hand-code it in assembly for speed

> Does WK4x4 gives better compression (than WKdm) for textual=20
> data also (since I'm currently working on compressing only=20
> page-cache data)?

Somewhat, but not especially.  Both assume that the fundamental input
symbol for compression is a word, not a byte.  It's looking for
regularities in integer and pointer values, which is not what most file
data is going to look like.

> I think better compression with slightly slower algorithm=20
> will be better than lower compression but somewhat faster algorithm.

Please see my dissertation:

  http://www.cs.amherst.edu/~sfkaplan/papers/sfkaplan-dissertation.ps.gz

I've already done this analysis.  Basically, you can choose either
improved speed or tightness, and either one will comparably improve
overall performance, although through different compressed cache sizes.
Saying that one is more desirable than another is nonsensical -- both
are equally valuable here.  What you *do* want is the ability either (a)
to improve compressibility for a minimal decrease in speed, or (b) to
improve speed with a minimal decrease in compressibility.  If you give
up too much of one for the other, you've lost.

> Can you please suggest any other alternates to LZO for=20
> compressing file-system pages?

The closest match is LZRW1, but it's plainly not as fast or tight as
LZO.  There may be newer ones out there, but I'm not aware of anything
as fast as LZO in the Lempel-Ziv family, and speed is critical here.

> In particular, I'm looking at SquashFS(a compressed=20
> file-system) which uses 'zlib' for compression and is very=20
> fast and stable (it's widely used for LiveCDs).

Zlib uses the modified LZ77 that is the basis of gzip.  It's faster than
most file-system level compressors, but it's substantially slower than
LZO or WK*.  It's possible that processors are fast enough now that zlib
will provide *some* benefit, but you will definitely be losing out on
the potential gains that LZO (or something like it) could provide on
byte-oriented data.

Scott