linuxcompressed-devel Mailing List for Linux Compressed Cache (Page 7)
Status: Beta
Brought to you by:
nitin_sf
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(1) |
Dec
(11) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(22) |
Feb
(11) |
Mar
(31) |
Apr
(19) |
May
(17) |
Jun
(9) |
Jul
(13) |
Aug
(1) |
Sep
(10) |
Oct
(4) |
Nov
(10) |
Dec
(4) |
2003 |
Jan
|
Feb
(8) |
Mar
|
Apr
(5) |
May
(39) |
Jun
(10) |
Jul
(2) |
Aug
(1) |
Sep
(1) |
Oct
(27) |
Nov
(1) |
Dec
(2) |
2004 |
Jan
|
Feb
(3) |
Mar
(1) |
Apr
|
May
|
Jun
(3) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2005 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(9) |
Dec
(2) |
2006 |
Jan
(7) |
Feb
(4) |
Mar
(12) |
Apr
(16) |
May
(11) |
Jun
(48) |
Jul
(19) |
Aug
(16) |
Sep
(13) |
Oct
|
Nov
(8) |
Dec
(1) |
2007 |
Jan
(4) |
Feb
|
Mar
|
Apr
(3) |
May
(26) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(7) |
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Nitin G. <nit...@gm...> - 2006-05-29 19:39:36
|
Hi, > >> > >> > About CC, I and Briglia (and...@gm... >> > <mailto:and...@gm...>) are making some tests with >> > Rodrigo Castro version (kernel 2.4) for ARM platform. We are studying >> > a possible port for 2.6 and would like to cooperate with project. >> > >> >> Great! >> >> My exams (and other misc things) just ended today...now I can really >> jump for compressed caching :) >> Are you doing ccaching for "general" desktop systems or some kind of >> embedded device? > > Some kind of embedded device. :) Actually we tested and did some > measurements using the Rodrigo's patches for kernel 2.4.18. We tried > to implement a swap compression on 2.6 kernel, but we stoped when > recording to disk presented itself as a problem. > (I assume you are using flash disks for storage). So, have you dropped the idea of swap to flash disks? (Compressed) Swapping to flash disks have lots of issues as far as I have read about it. >> Just in case you didn't notice it earlier -- I have made a page >> linux-mm.org/CompressedCaching giving all the details collected by now >> for ccaching implementation on 2.6.x kernels (CompressedCaching/Code has >> the bare bones implementation created by now). Also, this is where I'll >> keep track of the project. > > Yes, we have been visiting your project's site. Will you use just the > site or we should communicate by this list as well? > The site is just to keep design details updated. We can always communicate through this list (and #linux-mm-cc). >> >> I intend to now focus on ccaching w.r.t OLPC systems. Thus, I'll now >> work on handling anon pages first instead of page cache pages. >> >> The biggest thing is compression storage structure -- heart of all this >> work. I've actually received 0 feedback on this by now. Since that >> involves my semi-random guess work too, it really needs your help :) >> All other work revolves around this thing. In particular please see >> 'Problems' listed at end of that page. > > Ok. We intend help you and the others to design this crucial part. Thanks! > How about that chunks approach? We thought you already decided to use it. Yes, I've decided to use this chunk approach as described on CompressedCaching page. But, still if you can give your feedback, we can always improve it. >> >> PS: I've 0 idea of ARM systems! > > Don't worry about this. The kernel core is similar. Of course we've > other issues to care about, but the design of ccaching can be used on > both systems. > Best Regards, Nitin Gupta |
From: Anderson B. <bri...@gm...> - 2006-05-29 18:06:00
|
Hi Nitin and others, > > > > About CC, I and Briglia (and...@gm... > > <mailto:and...@gm...>) are making some tests with > > Rodrigo Castro version (kernel 2.4) for ARM platform. We are studying > > a possible port for 2.6 and would like to cooperate with project. > > > > Great! > > My exams (and other misc things) just ended today...now I can really > jump for compressed caching :) > Are you doing ccaching for "general" desktop systems or some kind of > embedded device? Some kind of embedded device. :) Actually we tested and did some measurements using the Rodrigo's patches for kernel 2.4.18. We tried to implement a swap compression on 2.6 kernel, but we stoped when recording to disk presented itself as a problem. > Just in case you didn't notice it earlier -- I have made a page > linux-mm.org/CompressedCaching giving all the details collected by now > for ccaching implementation on 2.6.x kernels (CompressedCaching/Code has > the bare bones implementation created by now). Also, this is where I'll > keep track of the project. Yes, we have been visiting your project's site. Will you use just the site or we should communicate by this list as well? > > I intend to now focus on ccaching w.r.t OLPC systems. Thus, I'll now > work on handling anon pages first instead of page cache pages. > > The biggest thing is compression storage structure -- heart of all this > work. I've actually received 0 feedback on this by now. Since that > involves my semi-random guess work too, it really needs your help :) > All other work revolves around this thing. In particular please see > 'Problems' listed at end of that page. Ok. We intend help you and the others to design this crucial part. How about that chunks approach? We thought you already decided to use it. > > PS: I've 0 idea of ARM systems! Don't worry about this. The kernel core is similar. Of course we've other issues to care about, but the design of ccaching can be used on both systems. Best regards, Anderson Briglia PS: this is the e-mail address that I'll use to discussion about ccaching. |
From: Nitin G. <nit...@gm...> - 2006-05-29 16:41:30
|
Hi Allan and Briglia, > > Hi All, > > Firts, Congratulations Gupta!! Very good news! Thanks! :) > > About CC, I and Briglia (and...@gm... > <mailto:and...@gm...>) are making some tests with > Rodrigo Castro version (kernel 2.4) for ARM platform. We are studying > a possible port for 2.6 and would like to cooperate with project. > Great! My exams (and other misc things) just ended today...now I can really jump for compressed caching :) Are you doing ccaching for "general" desktop systems or some kind of embedded device? Just in case you didn't notice it earlier -- I have made a page linux-mm.org/CompressedCaching giving all the details collected by now for ccaching implementation on 2.6.x kernels (CompressedCaching/Code has the bare bones implementation created by now). Also, this is where I'll keep track of the project. I intend to now focus on ccaching w.r.t OLPC systems. Thus, I'll now work on handling anon pages first instead of page cache pages. The biggest thing is compression storage structure -- heart of all this work. I've actually received 0 feedback on this by now. Since that involves my semi-random guess work too, it really needs your help :) All other work revolves around this thing. In particular please see 'Problems' listed at end of that page. PS: I've 0 idea of ARM systems! Best Regards, Nitin Gupta |
From: Allan B. <all...@gm...> - 2006-05-29 14:14:27
|
>> Nitin Gupta wrote: >> HI All, >> >> A GOOD NEWS!! >> >> This project has been selected in Google SoC with Rik van Riel as >> Mentor (Fedora org)!!! >> (http://fedoraproject.org/wiki/SummerOfCode) >> >> Coming back to its development as soon as these exams and party >> hangover ends...... :-) Hi All, Firts, Congratulations Gupta!! Very good news! About CC, I and Briglia (and...@gm...) are making some tests with Rodrigo Castro version (kernel 2.4) for ARM platform. We are studying = a possible port for 2.6 and would like to cooperate with project. Best Regards, -- Allan Bezerra |
From: Nitin G. <nit...@gm...> - 2006-05-24 14:59:26
|
Hi Rodrigo, Great to hear from you! I really hope to come up with something useful this summer. Thanks :) Best Regards, Nitin > Wow, Nitin, congratulations!! Very good news! > > It is going to be hard work, but I am sure it will be successful, in > particular having Rik as your mentor! Please keep sharing your > development progress with us. > > Cheers, > > Rodrigo > > Nitin Gupta wrote: >> HI All, >> >> A GOOD NEWS!! >> >> This project has been selected in Google SoC with Rik van Riel as >> Mentor (Fedora org)!!! >> (http://fedoraproject.org/wiki/SummerOfCode) >> >> Coming back to its development as soon as these exams and party >> hangover ends...... :-) > > > > > ------------------------------------------------------- > All the advantages of Linux Managed Hosting--Without the Cost and Risk! > Fully trained technicians. The highest number of Red Hat > certifications in > the hosting industry. Fanatical Support. Click to learn more > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 > _______________________________________________ > linuxcompressed-devel mailing list > lin...@li... > https://lists.sourceforge.net/lists/listinfo/linuxcompressed-devel > |
From: Rodrigo S de C. <ro...@te...> - 2006-05-24 13:49:09
|
Wow, Nitin, congratulations!! Very good news! It is going to be hard work, but I am sure it will be successful, in particular having Rik as your mentor! Please keep sharing your development progress with us. Cheers, Rodrigo Nitin Gupta wrote: > HI All, > > A GOOD NEWS!! > > This project has been selected in Google SoC with Rik van Riel as > Mentor (Fedora org)!!! > (http://fedoraproject.org/wiki/SummerOfCode) > > Coming back to its development as soon as these exams and party > hangover ends...... :-) |
From: Nitin G. <nit...@gm...> - 2006-05-24 11:01:59
|
HI All, A GOOD NEWS!! This project has been selected in Google SoC with Rik van Riel as Mentor (Fedora org)!!! (http://fedoraproject.org/wiki/SummerOfCode) Coming back to its development as soon as these exams and party hangover ends...... :-) Cheers, Nitin |
From: <asb...@if...> - 2006-05-24 10:28:30
|
Anderson Briglia wrote: > I'm trying to implement compressed swap on 2.6 kernel. I already read > the Nitin Gupta's wiki page about the implementation of Compressed > Cache on 2.6.x kernel. > Well, what I'm trying to do is quite simpler. I just wanna to use a > compressed swap area (not swap cache, just the swap area). <snip> hm, I think you can do this already by putting the swap on a loopback device that does compressing, if not, that might be the route to go.. less kernel hacking :) Mvh, Asbj=F8rn Sannes |
From: Anderson B. <bri...@gm...> - 2006-05-23 21:19:10
|
Hi all, I'm trying to implement compressed swap on 2.6 kernel. I already read the Nitin Gupta's wiki page about the implementation of Compressed Cache on 2.6.x kernel. Well, what I'm trying to do is quite simpler. I just wanna to use a compressed swap area (not swap cache, just the swap area). I'm using the LZO algo to compress the pages but I'm having problems to record a compressed page in the swap device (swap area). How can I use the bio structure to include a compressed page? The bio structure already uses PAGE_SIZE as size, can I pass to bio the comp_size instead of PAGE_SIZE? Best Regards, Anderson Briglia |
From: Nitin G. <nit...@gm...> - 2006-05-02 18:19:53
|
Hi All, There has been very little (coding) progress in last week as I have been really busy -- the exams are again on my head. They are starting with 16th May and at maximum I will be able to work on this till 10th May. However these are the _last_ ones I have to give (end on 26th May). After this I can work on it with full force! :) Also, I have revised specs for future ccache work, documented some of background info and posted on linux-mm.org/CompressedCaching. This is where I will keep track of this project (basically all planning work -- CVS, mailing list will stay here) so that we know where we are heading. This is a Wiki page and anyone can edit it. I hope this will help with much more active design/planning (and hence coding) work. Ah…Why didn’t I do this earlier?? Please have a look at it and post _any_ of suggestions you have. Cheers :) Nitin |
From: Nitin G. <nit...@gm...> - 2006-04-27 12:31:46
|
Hi, Flávio Etrusco wrote: > > - maybe this serialization is not that bad? What order of time is > compress() expected to take? I haven't yet measured how much time compress() actually takes (I expect an order of few 100 ns for WKdm). But even if WKdm is fast, I don't expect LZO to be fast enough to be serialized. But then we can keep WKdm serialized and LZO kmalloc()ed. I am not sure of any single solution which will be good for systems of different speeds. Maybe, maybe we are overestimating cost of kmalloc()s...I don't know. Anyway both these methods are worth proper benchmarks -- maybe then we'll be able to see. > - if the system is low on (free) RAM, isn't using kmalloc likely to > double the pages to swap out? If system is so low on memory that it can't even allocate few KBs without swapping then ccache better start freeing pages :) It should be allowed to add pages only when free RAM space is above certain threshold. > - are you sure you're not neglecting the overhead of kmalloc? > I am not sure if kmalloc overhead is negligible or not compared to serialization. Also, which one of these should be faster: kmalloc(sizeof(WK_word)*300, GFP_KERNEL); kmalloc(sizeof(WK_word)*300, GFP_KERNEL); kmalloc(sizeof(unsigned short)*1200, GFP_KERNEL); or, single kmalloc allocating sum of above three spaces ? Cheers, Nitin |
From: Rodrigo S de C. <ro...@te...> - 2006-04-26 16:54:22
|
Fl=E1vio Etrusco wrote: > PS. What about changing the default of the list to reply-to-list? ;-) > =20 Done. I hope it works from now on :-) Rodrigo |
From: <fla...@gm...> - 2006-04-26 16:33:45
|
Hi, > > Static allocation will require calls to compress() to be > > serialized [...] > > Good point! Stick to your idea. :-) May I raise some points from an uninformed POV? ;-) - maybe this serialization is not that bad? What order of time is compress() expected to take? - if the system is low on (free) RAM, isn't using kmalloc likely to double the pages to swap out? - are you sure you're not neglecting the overhead of kmalloc? Cheers, Fl=E1vio PS. What about changing the default of the list to reply-to-list? ;-) |
From: Scott K. <sfk...@am...> - 2006-04-26 15:07:21
|
Nitin, Nitin wrote: > I was trying get_jiffies_64() From what I can tell, this function returns a simple count of the jiffies (1/100ths of a second) as a 64-bit integer. It's not in nanoseconds, and a (de)compression occurs within much less than one jiffy. I think you want sched_clock(), which not only returns nanoseconds, but uses rdrscll() to obtain the time using the cycle counter (unless it's a NUMA machine, which I assume is not your first concern here). > Static allocation will require calls to compress() to be=20 > serialized [...] Good point! Stick to your idea. :-) > Isn't static allocation a special case of global allocation=20 > where variable name is hidden in function scope? I'd call it the other way around: Both global variables and local variables annotated with `static' result in statically allocated space that becomes part of the executable's image. Only one requires the language-level keyword `static', but both are statically allocated and ultimately located in the same region in memory. Scott |
From: Nitin G. <nit...@gm...> - 2006-04-26 11:16:55
|
Hello Scott Kaplan, > >> Currently I have some problem getting timing information >> needed to see how much time de/compress() is taking. > > Can't you turn off power-saving CPU features and use rdtscll()? > I was trying get_jiffies_64() following how printk() gets timing info when kernel debugging is turned on. Surprisingly, I was getting same time before and after compress() even when it gives time in nanoseconds -- so there's is certainly some kind of problem with how I'm using it. I will now try your suggestion. >> I was just thinking to bring down temp buffers to <4k >> so that I can de/allocate single (0 order) page for these buffers. > > Why not allocate them statically? Then you don't have to do the work of > allocating and freeing even these 0-order pages. Even if it is fast, > it's going to be frequently repeated work that can easily be avoided. > In any case, either approach is going to be better than kmalloc/kfree. > Static allocation will require calls to compress() to be serialized using locks since same allocated buffer space will be used for each call. This will prevent more than one compress() to happen simultaneously like: compress (page 1) { lock temp_buffers() --- pre-emt ---- compress (page 2) { wait for lock on buffers() do some work() unlock temp_buffers() } // got lock do some work() unlock temp_buffers() } This waiting for lock is not required if these buffers are dynamically allocated since then each call will have separate buffer space to work in. Isn't static allocation a special case of global allocation where variable name is hidden in function scope? Is there any other difference in static/global? Best Regards, Nitin |
From: Scott K. <sfk...@am...> - 2006-04-25 12:52:56
|
Nitin,=20 > Currently I have some problem getting timing information=20 > needed to see how much time de/compress() is taking. Can't you turn off power-saving CPU features and use rdtscll()? > Kernel mode stack is only 4k/8k. So, it is not possible to=20 > allocate these temporary buffers (which are about 4k when=20 > shorts are used and about 7k when ints/long are used) on=20 > stack. Good point. > I was just thinking to bring down temp buffers to <4k=20 > so that I can de/allocate single (0 order) page for these buffers. Why not allocate them statically? Then you don't have to do the work of allocating and freeing even these 0-order pages. Even if it is fast, it's going to be frequently repeated work that can easily be avoided. In any case, either approach is going to be better than kmalloc/kfree. > ceil() is now replaced with simple integer operations to get=20 > ceiling value. Nice! Scott |
From: Nitin G. <nit...@gm...> - 2006-04-24 19:10:47
|
Hello Scott Kaplan, > >> -- current WKdm uses Wk_word temp[1200] to store unpacked 10-bits i.e. >> 1 word (32/64-bit) per 10-bits. Why doesn't it use >> short(16-bit) to store these 10-bit patterns in unpacked >> state? Although 2k memory saving is insignificant but why >> waste it for free? Was it done for some optimization purpose? >> > > The goal was efficiency. Some architectures don't handle half-words as > quickly as full words. For the x86, there may be no difference. I'd > handle this as an empirical issue: Compile is using shorts and test the > speed; change in back to ints/longs and see if it's any faster. If not, > keep it as 16-bit values. > > Currently I have some problem getting timing information needed to see how much time de/compress() is taking. I will definitely compare speeds for these cases. Based on results it may be useful to have arch specific compile directives -- code needs minimal changes for short to ints/long conversion in this case, so it would not be a problem to have some arch specific optimization too. >> -- WKdm_compress() does kmalloc()/kfree() pair thrice per >> call to this function. Is it too much overhead? Should these >> temp buffers be made global and use locking to serialize >> request for them (this isn't performance friendly either)? >> > > The original code, which ran only in user-land, did no heap allocating. > Given that the compressor will be called very frequently, it needs to be > fast. If you can allocate these spaces statically or on the stack, then > do it and save the malloc/free operations. > > Kernel mode stack is only 4k/8k. So, it is not possible to allocate these temporary buffers (which are about 4k when shorts are used and about 7k when ints/long are used) on stack. I was just thinking to bring down temp buffers to <4k so that I can de/allocate single (0 order) page for these buffers. De/allocating 0-order pages is fast. This will eliminate need for triple kmalloc/kfree(). Or, maybe allocating a double page is still more efficient than triple kmalloc/kfree...I'll benchmark these cases. >> If kernel preempts in this ceil(), will value of 'x' remain >> valid (kernel does not save and restore the floating point processor's >> state) ? >> > > The problem is worse than that: Since the kernel does not preserve FP > registers, use of FP values in the kernel may clobber legitimate > user-land values. You can't do that. You have to change the use of > doubles to some kind of integer for kernel code. > > ceil() is now replaced with simple integer operations to get ceiling value. Best Regards, Nitin |
From: Scott K. <sfk...@am...> - 2006-04-24 16:38:02
|
Nitin, > -- current WKdm uses Wk_word temp[1200] to store unpacked 10-bits i.e. > 1 word (32/64-bit) per 10-bits. Why doesn't it use=20 > short(16-bit) to store these 10-bit patterns in unpacked=20 > state? Although 2k memory saving is insignificant but why=20 > waste it for free? Was it done for some optimization purpose? The goal was efficiency. Some architectures don't handle half-words as quickly as full words. For the x86, there may be no difference. I'd handle this as an empirical issue: Compile is using shorts and test the speed; change in back to ints/longs and see if it's any faster. If not, keep it as 16-bit values. > -- WKdm_compress() does kmalloc()/kfree() pair thrice per=20 > call to this function. Is it too much overhead? Should these=20 > temp buffers be made global and use locking to serialize=20 > request for them (this isn't performance friendly either)? The original code, which ran only in user-land, did no heap allocating. Given that the compressor will be called very frequently, it needs to be fast. If you can allocate these spaces statically or on the stack, then do it and save the malloc/free operations. > If kernel preempts in this ceil(), will value of 'x' remain=20 > valid (kernel does not save and restore the floating point processor's > state) ? The problem is worse than that: Since the kernel does not preserve FP registers, use of FP values in the kernel may clobber legitimate user-land values. You can't do that. You have to change the use of doubles to some kind of integer for kernel code. Scott |
From: Nitin G. <nit...@gm...> - 2006-04-23 20:16:36
|
Hi, I just finished porting WKdm to kernel mode and have created a small kernel module (attached) to test it separately from other things. The module creates two /proc entries: /proc/compress & /proc/decompress. Writing a file to /proc/compress compresses it and stores it in an internal buffer and reading /proc/decompress shows the decompressed output. Also, reading /proc/compress shows some stats (currently only original and compressed size -- had trouble adding timer info). I think this will be useful when porting other de/compression algorithms too. File to be written to /proc/compress should be <=3D4k (<4k files are padded to make them 4k). However, I have some doubts: -- current WKdm uses Wk_word temp[1200] to store unpacked 10-bits i.e. 1 word (32/64-bit) per 10-bits. Why doesn't it use short(16-bit) to store these 10-bit patterns in unpacked state? Although 2k memory saving is insignificant but why waste it for free? Was it done for some optimization purpose? Currently, I've changed it to use short's as mentioned but can always revert back if it was done purposefully. -- WKdm_compress() does kmalloc()/kfree() pair thrice per call to this function. Is it too much overhead? Should these temp buffers be made global and use locking to serialize request for them (this isn't performance friendly either)? -- use of ceil() has been removed: had doubts with Rodrigo's implementation which does: ceil (double x) { =09if (!(x - ((int) x))) =09=09return ((unsigned int) x); =09return ((unsigned int) (x + 1)); } If kernel preempts in this ceil(), will value of 'x' remain valid (kernel does not save and restore the floating point processor's state) ? Now, I'll try actually compressing pages with WKdm and then port other algorithms like WK4x4 and LZO (copy-paste from Rodrigo's patch :) when use of WKdm in cc-patch goes stable. Cheers, Nitin |
From: Scott K. <sfk...@am...> - 2006-04-17 18:27:48
|
Nitin, > One particular thing I'm trying to improve is to minimize > extra memory it requires during compression. Presently, > it stores tags and other bits one per byte (in case of > partial or exact match) and later packs them together. > I think, this packed output can be directly generated > without this intermediate step -- this should greatly > reduce its memory footprint. I think you're optimizing the wrong thing here. We chose to defer the packing of bits in this manner intentionally: it improves the algorithm's speed substantially (by more than 3x). In exchange, we did need temporary space to store the bits, but the arrays required still leave WK* using less book-keeping space than any other compressor we encountered. For example, LZO needs a 64 KB dictionary, while WKdm uses something like 20 KB. Moreover, real systems have 10's to 100's of thousands of pages (e.g. 256 MB, small by current standards, is 65,536 page frames). Use of 5 (WKdm) to 16 (LZO) pages for temporary data structures accounts for 0.008% to 0.02% of total main memory consumption respectively. Shrinking the footprint will yield no performance improvement -- the noise in measuring running time will be much larger than the performance gained by increasing main memory capacity by 0.02% (in the best case!). Meanwhile, you will have slowed compression substantially, increasing the cost of keeping pages compressed and thus limiting the ideal compressed cache size. You will have lost much more in overall performance than you gained. If you want to improve the WK* algorithms, try: (a) A 32- or 64-word dictionary (b) Varying the associativity (not just direct-mapped and 4-way set associative) (c) Hand-code it in assembly for speed > Does WK4x4 gives better compression (than WKdm) for textual=20 > data also (since I'm currently working on compressing only=20 > page-cache data)? Somewhat, but not especially. Both assume that the fundamental input symbol for compression is a word, not a byte. It's looking for regularities in integer and pointer values, which is not what most file data is going to look like. > I think better compression with slightly slower algorithm=20 > will be better than lower compression but somewhat faster algorithm. Please see my dissertation: http://www.cs.amherst.edu/~sfkaplan/papers/sfkaplan-dissertation.ps.gz I've already done this analysis. Basically, you can choose either improved speed or tightness, and either one will comparably improve overall performance, although through different compressed cache sizes. Saying that one is more desirable than another is nonsensical -- both are equally valuable here. What you *do* want is the ability either (a) to improve compressibility for a minimal decrease in speed, or (b) to improve speed with a minimal decrease in compressibility. If you give up too much of one for the other, you've lost. > Can you please suggest any other alternates to LZO for=20 > compressing file-system pages? The closest match is LZRW1, but it's plainly not as fast or tight as LZO. There may be newer ones out there, but I'm not aware of anything as fast as LZO in the Lempel-Ziv family, and speed is critical here. > In particular, I'm looking at SquashFS(a compressed=20 > file-system) which uses 'zlib' for compression and is very=20 > fast and stable (it's widely used for LiveCDs). Zlib uses the modified LZ77 that is the basis of gzip. It's faster than most file-system level compressors, but it's substantially slower than LZO or WK*. It's possible that processors are fast enough now that zlib will provide *some* benefit, but you will definitely be losing out on the potential gains that LZO (or something like it) could provide on byte-oriented data. Scott |
From: Nitin G. <nit...@gm...> - 2006-04-17 17:19:11
|
Hi Rodrigo, Rodrigo S de Castro wrote: > Nitin Gupta wrote: >> I was testing WKdm de/compression algo and surprisingly it's not >> working!! The de-compressed and original text do not match. Rodrigo's >> patch contains exactly this algorithm implementation (did he make some >> changes which I can't see?) and it works (though, I never myself tried >> that patch). >> > > What exactly is not working? > > I remember I changed some things in WKdm to make it compile. If I > recall correctly, WKdm uses some userspace libraries not available in > kernel space (ceiling/floor math functions?) and I implemented these > functions somewhere in the code and made them available for WKdm to > work. If this is the problem you are facing, I may take a look at the > code to recall with more details. > Ah...I just did not look into WKdm code carefully. It is currently hard-coded to work only when number of input words is 1024 (i.e. 4K pages) and I was testing it for 2, 256, 512 etc. no. of words -- it's working perfectly well for 4K input. I have not yet started taking it to kernel space and was just testing it in user-space. There are many optimizations I can see in current WKdm implementation and I'll first try to get some of these optimizations working in user-space and then adapt it to kernel space. >> Can somebody please look into WKdm implementation >> (http://www.cs.utexas.edu/users/oops/compressed-caching/WKdm.tgz) and >> tell what's wrong? >> >> (WK4x4 seems to work okay, while LZO is pain in neck to get it into >> kernel...so I've dropped LZO idea for now). >> > > LZO works in my patch, but I cannot remember what I have done for that. > Yes, I saw your patch has included LZO. But, I could not figure out how much memory footprint it requires during compression of a page. Due to design of compressed cache I am trying to implement, it will be difficult to use any compression algorithms as 'black box'. Since WKdm and WK4x4 are well documented and easy to understand, I am giving much more stress to them. Do you have any documents describing LZO algorithm in detail -- I could not find any of them myself ? Or, maybe your own study of LZO ? This will be of great help :) >> WKdm is fastest with nice compression ratios so it'll be nice to have >> it working instead of WK4x4 (we can always include both -or more- of >> them later but for now.....). >> > > Remember that WKdm is very nice with swap cache data, but not with > general page cache data. > What about WK4x4 -- does it shows good (I mean, somewhere between LZO and WKdm) compression for page cache pages as well? Best Regards, Nitin |
From: Nitin G. <nit...@gm...> - 2006-04-17 17:12:36
|
Hello Scott Kaplan, Scott Kaplan wrote: > Nitin, > > I'm the `K' in WK, so I might be some kind of help. (Then again, I > might not!) > > WKdm was written only to be research-grade: It worked well enough for > experimental uses. So, I'm not surprised that you're having a bit more > difficulty with it. Can you give us any more information about the > errors? Do the mangled pages have `minor' errors (just a few bytes that > are incorrect), or are there large chunks of the page that are > incorrect? Is it possible for the compression/decompression to be > interrupted? It was not written to support that. > > Nitin Gupta wrote: > > I just missed the fact that WKdm is currently hard-coded to work only for input of 1024 words -- its working perfectly alright for 4K input. Currently I'm just testing it in user-space and have not yet started porting it to kernel space. One particular thing I'm trying to improve is to minimize extra memory it requires during compression. Presently, it stores tags and other bits one per byte (in case of partial or exact match) and later packs them together. I think, this packed output can be directly generated without this intermediate step -- this should greatly reduce its memory footprint. >> WKdm is fastest with nice compression ratios so it'll be >> nice to have it working instead of WK4x4 >> > > I will note, in case you care, that WK4x4 could be much faster if it > were re-written from the WKdm code-base. WKdm had some hand-optimizing > (at the C level) that WK4x4 did not enjoy. It's possible that WK4x4 > could give its better compression ratios with a (de)compression time > that is close to WKdm's. > > Does WK4x4 gives better compression (than WKdm) for textual data also (since I'm currently working on compressing only page-cache data)? Also, I'll be working to try some optimizations for both WKdm and WK4x4 in terms of speed, compression and memory overhead when adapting them to kernel space. I think better compression with slightly slower algorithm will be better than lower compression but somewhat faster algorithm. > Rodrigo de Castro wrote: > >> Remember that WKdm is very nice with swap cache data, but not >> with general page cache data. >> > > Right. The WK algorithms were written with VM-based (primarily heap) > pages in mind. Something like LZO is likely to do better with file > system pages. > > Scott > > Can you please suggest any other alternates to LZO for compressing file-system pages? Basically, the problem with LZO is that it's not well documented and it's implementation is cluttered to be very corss-platform, cross-OS, cross-compiler-versions, workarounds for specific archs, OSs, compiler versions etc. and is highly optimized which makes it difficult to understand. So, although it may be made to work in kernel mode (Rodrigo's cc-patch already does this very well!) but it's difficult to adapt it for compressed cache requirements and understand its memory requirements. In particular, I'm looking at SquashFS(a compressed file-system) which uses 'zlib' for compression and is very fast and stable (it's widely used for LiveCDs). Also, it will give the benefit of having de/compression algorithms already in kernel space specially suited for file-system data and also zlib is well documented. But since SquashFS is not worried about compression times which is done 'offline', it may have to be worked out there. Best Regards, Nitin |
From: Scott K. <sfk...@am...> - 2006-04-17 02:13:39
|
Nitin, I'm the `K' in WK, so I might be some kind of help. (Then again, I might not!) WKdm was written only to be research-grade: It worked well enough for experimental uses. So, I'm not surprised that you're having a bit more difficulty with it. Can you give us any more information about the errors? Do the mangled pages have `minor' errors (just a few bytes that are incorrect), or are there large chunks of the page that are incorrect? Is it possible for the compression/decompression to be interrupted? It was not written to support that. Nitin Gupta wrote: > WKdm is fastest with nice compression ratios so it'll be=20 > nice to have it working instead of WK4x4 I will note, in case you care, that WK4x4 could be much faster if it were re-written from the WKdm code-base. WKdm had some hand-optimizing (at the C level) that WK4x4 did not enjoy. It's possible that WK4x4 could give its better compression ratios with a (de)compression time that is close to WKdm's. Rodrigo de Castro wrote: > Remember that WKdm is very nice with swap cache data, but not=20 > with general page cache data. Right. The WK algorithms were written with VM-based (primarily heap) pages in mind. Something like LZO is likely to do better with file system pages. Scott |
From: Rodrigo S de C. <ro...@te...> - 2006-04-16 23:22:33
|
Hi Nitin, Nitin Gupta wrote: > I was testing WKdm de/compression algo and surprisingly it's not > working!! The de-compressed and original text do not match. Rodrigo's > patch contains exactly this algorithm implementation (did he make some > changes which I can't see?) and it works (though, I never myself tried > that patch). > What exactly is not working? I remember I changed some things in WKdm to make it compile. If I recall correctly, WKdm uses some userspace libraries not available in kernel space (ceiling/floor math functions?) and I implemented these functions somewhere in the code and made them available for WKdm to work. If this is the problem you are facing, I may take a look at the code to recall with more details. > Can somebody please look into WKdm implementation > (http://www.cs.utexas.edu/users/oops/compressed-caching/WKdm.tgz) and > tell what's wrong? > > (WK4x4 seems to work okay, while LZO is pain in neck to get it into > kernel...so I've dropped LZO idea for now). > LZO works in my patch, but I cannot remember what I have done for that. > WKdm is fastest with nice compression ratios so it'll be nice to have > it working instead of WK4x4 (we can always include both -or more- of > them later but for now.....). > Remember that WKdm is very nice with swap cache data, but not with general page cache data. Cheers, Rodrigo |
From: Nitin G. <nit...@gm...> - 2006-04-16 15:50:59
|
Hi, I was testing WKdm de/compression algo and surprisingly it's not working!! The de-compressed and original text do not match. Rodrigo's patch contains exactly this algorithm implementation (did he make some changes which I can't see?) and it works (though, I never myself tried that patch). Can somebody please look into WKdm implementation (http://www.cs.utexas.edu/users/oops/compressed-caching/WKdm.tgz) and tell what's wrong? (WK4x4 seems to work okay, while LZO is pain in neck to get it into kernel...so I've dropped LZO idea for now). WKdm is fastest with nice compression ratios so it'll be nice to have it working instead of WK4x4 (we can always include both -or more- of them later but for now.....). Cheers, Nitin |