Re: [lc-devel] Page cache compression

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

For me, I am a filesystem developer for Linux. I use page cache because 
of performance, it will be a pity to increase loading and delay for page 
cache access. It will seriously harm performance of network based 
filesystem, because they are originally slow and is speed up by using 
page cache.
Rodrigo Souza de Castro wrote:

>Hi David et al,
>
>On Fri, Mar 22, 2002 at 10:15:25PM +0800, David Chow wrote:
>
>>Would anyone tell me what is page cache compression, on the change
>>log for 0.22 Rodrigo says page cache is also compressed, can Rodrigo
>>explain please. I would like to know what to aware when I implement
>>my filesystem, because I also have compression in my page
>>caches. Thanks.
>>
>
>The 0.22 version includes some new features like page cache
>compression and clean pages support. The latter means that not only
>the dirty pages, but also the clean pages, are added to compressed
>cache. It something very sensible for compressed cache idea but wasn't
>implemented so far.
>
As for me, I write compression filesystems to speed up transfer speed on 
slow media. Over the network the effect is obvious, even on slow IDE 
drives, the speed improvements is also obvious.

>
>The page cache support in compressed cache allows all file mapped
>pages (ie, page cache pages) to be added to compressed cache. I
>thought of implementing page cache support after noticing that some IO
>intensive applications didn't have performance gains with the "usual"
>compressed cache (the one which stores only anonymous pages). Some
>tests had been run and out of them some statistics were made available
>on our project page (statistics for version 0.21) where we could
>notice performance drop when running IO intensive applications (for
>example, dbench) on a kernel with compressed cache enabled.
>
>At that moment I didn't think very much on the impact of an idea like
>that and started implementing this support to make compressed cache
>work better for these cases. Along with the clean page support, it
>actually helped IO intensive applications, like dbench (you can have a
>look at the statistics page for 0.22pre7 and later version).
>
I page cache compression is implemented, that means every writes from 
user space to pages will have to go through compressed cache, I dont' 
this is a good idea. From my experience, most of the filesystem uses 
generic_file_read/write paths such that all writes are sychronous and 
will call fs specific commit_write() directy. Only shared mmaped pages 
will delay writes. Cache implementation is an fs specific issue.  I 
can't see many applications uses shared mmaped pages for I/O intensive 
operations.

>
>Nevertheless, this page cache support is new and experimental, never
>implemented in other compressed cache implementations and not even
>mentioned in Scott Kaplan's thesis. We still have to perform tests and
>some deep analyses to check if this support will be kept in the
>future, mainly after implementing adaptivity.
>
>You can notice that a compressed cache like that, with swap cache and
>page cache support, will hold pages with very different
>behaviours. Unlike swap cache pages, dirty file mapped pages need be
>synced to the disk, which will follow OS-specific parameters (kupdated
>parameters in Linux, for example) or user/applications parameters (for
>instance, syscalls or mount with sync flag). That's why I don't know
>exactly when it's worth to compress a file mapped page, since it might
>be decompressed to be synced right after the compression compression
>(thus we might be wasting compressions).
>
In my sense, it is only sensible to compress clean page cache so that 
the amount of cached data in memory will be increased that means less I/O .

>
>Besides that, in both cases storing a page from either swap cache or
>page caches might save us a disk read (if we reference it again and we
>still have it in compressed cache and it would have to be read from
>disk otherwise). In swap cache page, it might even save us a disk
>write if it happens to be compressed and never have to write out to
>swap until program exits (in common case we have dirty swap
>pages). With a page cache page, we are still not sure it might save us
>a disk write, since it will need to synced soon if dirty.
>
As I said, generic read/write paths will sync to all fs specific write 
calls, and whether sync to disk or not is an fs specific issue, think 
about mounting NFS with sync and async.

>
>Those reasons above make me believe we won't have a single parameter
>for adaptivity that will work with swap and page cache at the same
>time. I don't even think we would be able to find a single parameter
>that would work well only with page cache. Therefore, I am not sure
>this support will be kept when adaptivity gets implemented.
>
>Anyway, I think it's been worth to think about this problem no matter
>what happens to this page cache support. This page cache support
>showed us that we were right about some assumptions (for example, that
>dbench bad performance was, even partialy, due to a smaller page and
>buffer caches) and also helped us see some similar problems in other
>compressed cache implementations.
>
>Best regards,
>
I think compressing dirty page is worthless, it will just increase the 
time delay for a dirty page being write to disk. I think this is the 
most challlenging for fs developers wants to sync all dirty page to disk 
as soon as possible and make dirty pages reachs their fs specific calls. 
At the moment, vfs and kswapd takes care of dirty page and fs specific 
code have less control in this. For me I would like the dirty pages come 
to my fs specific code so that I can take control more soon.

Journalling filesystems is a trend, and these filesystems likely want to 
commit dirty pages in all transactions of data writes. So thinking of 
this you will understand my point.

I suggest having a compile time option to support clean page cache 
compression, this will have an effect of upsizing the page cache. It 
also make page cache compression easier to implement by just discards 
invalidated compressed pages for a 1 way implementation.