Thread: RE: [lc-devel] WKdm Problem
Status: Beta
Brought to you by:
nitin_sf
From: Scott K. <sfk...@am...> - 2006-04-17 02:13:39
|
Nitin, I'm the `K' in WK, so I might be some kind of help. (Then again, I might not!) WKdm was written only to be research-grade: It worked well enough for experimental uses. So, I'm not surprised that you're having a bit more difficulty with it. Can you give us any more information about the errors? Do the mangled pages have `minor' errors (just a few bytes that are incorrect), or are there large chunks of the page that are incorrect? Is it possible for the compression/decompression to be interrupted? It was not written to support that. Nitin Gupta wrote: > WKdm is fastest with nice compression ratios so it'll be=20 > nice to have it working instead of WK4x4 I will note, in case you care, that WK4x4 could be much faster if it were re-written from the WKdm code-base. WKdm had some hand-optimizing (at the C level) that WK4x4 did not enjoy. It's possible that WK4x4 could give its better compression ratios with a (de)compression time that is close to WKdm's. Rodrigo de Castro wrote: > Remember that WKdm is very nice with swap cache data, but not=20 > with general page cache data. Right. The WK algorithms were written with VM-based (primarily heap) pages in mind. Something like LZO is likely to do better with file system pages. Scott |
From: Scott K. <sfk...@am...> - 2006-04-17 18:27:48
|
Nitin, > One particular thing I'm trying to improve is to minimize > extra memory it requires during compression. Presently, > it stores tags and other bits one per byte (in case of > partial or exact match) and later packs them together. > I think, this packed output can be directly generated > without this intermediate step -- this should greatly > reduce its memory footprint. I think you're optimizing the wrong thing here. We chose to defer the packing of bits in this manner intentionally: it improves the algorithm's speed substantially (by more than 3x). In exchange, we did need temporary space to store the bits, but the arrays required still leave WK* using less book-keeping space than any other compressor we encountered. For example, LZO needs a 64 KB dictionary, while WKdm uses something like 20 KB. Moreover, real systems have 10's to 100's of thousands of pages (e.g. 256 MB, small by current standards, is 65,536 page frames). Use of 5 (WKdm) to 16 (LZO) pages for temporary data structures accounts for 0.008% to 0.02% of total main memory consumption respectively. Shrinking the footprint will yield no performance improvement -- the noise in measuring running time will be much larger than the performance gained by increasing main memory capacity by 0.02% (in the best case!). Meanwhile, you will have slowed compression substantially, increasing the cost of keeping pages compressed and thus limiting the ideal compressed cache size. You will have lost much more in overall performance than you gained. If you want to improve the WK* algorithms, try: (a) A 32- or 64-word dictionary (b) Varying the associativity (not just direct-mapped and 4-way set associative) (c) Hand-code it in assembly for speed > Does WK4x4 gives better compression (than WKdm) for textual=20 > data also (since I'm currently working on compressing only=20 > page-cache data)? Somewhat, but not especially. Both assume that the fundamental input symbol for compression is a word, not a byte. It's looking for regularities in integer and pointer values, which is not what most file data is going to look like. > I think better compression with slightly slower algorithm=20 > will be better than lower compression but somewhat faster algorithm. Please see my dissertation: http://www.cs.amherst.edu/~sfkaplan/papers/sfkaplan-dissertation.ps.gz I've already done this analysis. Basically, you can choose either improved speed or tightness, and either one will comparably improve overall performance, although through different compressed cache sizes. Saying that one is more desirable than another is nonsensical -- both are equally valuable here. What you *do* want is the ability either (a) to improve compressibility for a minimal decrease in speed, or (b) to improve speed with a minimal decrease in compressibility. If you give up too much of one for the other, you've lost. > Can you please suggest any other alternates to LZO for=20 > compressing file-system pages? The closest match is LZRW1, but it's plainly not as fast or tight as LZO. There may be newer ones out there, but I'm not aware of anything as fast as LZO in the Lempel-Ziv family, and speed is critical here. > In particular, I'm looking at SquashFS(a compressed=20 > file-system) which uses 'zlib' for compression and is very=20 > fast and stable (it's widely used for LiveCDs). Zlib uses the modified LZ77 that is the basis of gzip. It's faster than most file-system level compressors, but it's substantially slower than LZO or WK*. It's possible that processors are fast enough now that zlib will provide *some* benefit, but you will definitely be losing out on the potential gains that LZO (or something like it) could provide on byte-oriented data. Scott |
From: Nitin G. <nit...@gm...> - 2006-04-17 17:12:36
|
Hello Scott Kaplan, Scott Kaplan wrote: > Nitin, > > I'm the `K' in WK, so I might be some kind of help. (Then again, I > might not!) > > WKdm was written only to be research-grade: It worked well enough for > experimental uses. So, I'm not surprised that you're having a bit more > difficulty with it. Can you give us any more information about the > errors? Do the mangled pages have `minor' errors (just a few bytes that > are incorrect), or are there large chunks of the page that are > incorrect? Is it possible for the compression/decompression to be > interrupted? It was not written to support that. > > Nitin Gupta wrote: > > I just missed the fact that WKdm is currently hard-coded to work only for input of 1024 words -- its working perfectly alright for 4K input. Currently I'm just testing it in user-space and have not yet started porting it to kernel space. One particular thing I'm trying to improve is to minimize extra memory it requires during compression. Presently, it stores tags and other bits one per byte (in case of partial or exact match) and later packs them together. I think, this packed output can be directly generated without this intermediate step -- this should greatly reduce its memory footprint. >> WKdm is fastest with nice compression ratios so it'll be >> nice to have it working instead of WK4x4 >> > > I will note, in case you care, that WK4x4 could be much faster if it > were re-written from the WKdm code-base. WKdm had some hand-optimizing > (at the C level) that WK4x4 did not enjoy. It's possible that WK4x4 > could give its better compression ratios with a (de)compression time > that is close to WKdm's. > > Does WK4x4 gives better compression (than WKdm) for textual data also (since I'm currently working on compressing only page-cache data)? Also, I'll be working to try some optimizations for both WKdm and WK4x4 in terms of speed, compression and memory overhead when adapting them to kernel space. I think better compression with slightly slower algorithm will be better than lower compression but somewhat faster algorithm. > Rodrigo de Castro wrote: > >> Remember that WKdm is very nice with swap cache data, but not >> with general page cache data. >> > > Right. The WK algorithms were written with VM-based (primarily heap) > pages in mind. Something like LZO is likely to do better with file > system pages. > > Scott > > Can you please suggest any other alternates to LZO for compressing file-system pages? Basically, the problem with LZO is that it's not well documented and it's implementation is cluttered to be very corss-platform, cross-OS, cross-compiler-versions, workarounds for specific archs, OSs, compiler versions etc. and is highly optimized which makes it difficult to understand. So, although it may be made to work in kernel mode (Rodrigo's cc-patch already does this very well!) but it's difficult to adapt it for compressed cache requirements and understand its memory requirements. In particular, I'm looking at SquashFS(a compressed file-system) which uses 'zlib' for compression and is very fast and stable (it's widely used for LiveCDs). Also, it will give the benefit of having de/compression algorithms already in kernel space specially suited for file-system data and also zlib is well documented. But since SquashFS is not worried about compression times which is done 'offline', it may have to be worked out there. Best Regards, Nitin |