linuxcompressed-devel Mailing List for Linux Compressed Cache (Page 8)
Status: Beta
Brought to you by:
nitin_sf
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(1) |
Dec
(11) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(22) |
Feb
(11) |
Mar
(31) |
Apr
(19) |
May
(17) |
Jun
(9) |
Jul
(13) |
Aug
(1) |
Sep
(10) |
Oct
(4) |
Nov
(10) |
Dec
(4) |
2003 |
Jan
|
Feb
(8) |
Mar
|
Apr
(5) |
May
(39) |
Jun
(10) |
Jul
(2) |
Aug
(1) |
Sep
(1) |
Oct
(27) |
Nov
(1) |
Dec
(2) |
2004 |
Jan
|
Feb
(3) |
Mar
(1) |
Apr
|
May
|
Jun
(3) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2005 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(9) |
Dec
(2) |
2006 |
Jan
(7) |
Feb
(4) |
Mar
(12) |
Apr
(16) |
May
(11) |
Jun
(48) |
Jul
(19) |
Aug
(16) |
Sep
(13) |
Oct
|
Nov
(8) |
Dec
(1) |
2007 |
Jan
(4) |
Feb
|
Mar
|
Apr
(3) |
May
(26) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2008 |
Jan
|
Feb
(7) |
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Nitin G. <nit...@gm...> - 2006-04-08 20:36:39
|
Hi, I've been very busy with exams for last week so couldn't work on it. But had some time to play more with the patch I last posted. It is quite stable initially, but when swap space usage increases to about 20MB, apps within dear KDE begin to freeze. I tried to look out for potential races but couldn't find any. Also, I never get OOPSes even when apps begin to freeze (as-such system never freezes -- I can always go to term and do tasks). I think these app freezes are because the system is running out of memory because of this patch. It simply doesn't allow clean page-cache(file-backed) pages to be freed, so under memory pressure, system might be killing important processes leading to app freezes. So, now I'll try to actually compress pages and limit no. of pages compressed to see real effect. Cheers, Nitin |
From: Nitin G. <nit...@gm...> - 2006-03-10 15:58:27
|
Hi, Just in case you missed it, see this post on LKML -- explains radix trees in detail which is very important for this project! ftp://ftp.kernel.org/pub/linux/kernel/people/npiggin/patches/lockless/2.6.1= 6-rc5/radix-intro.pdf Cheers, Nitin ---------- Forwarded message ---------- From: Nick Piggin <np...@su...> Date: Mar 10, 2006 8:48 PM Subject: A lockless pagecache for Linux To: Linux Kernel <lin...@vg...>, Linux Memory Management < lin...@kv...> Cc: Nick Piggin <np...@su...> Hi, I was waiting for 2.6.16 before releasing my patchset, but that got boring. ftp://ftp.kernel.org/pub/linux/kernel/people/npiggin/patches/lockless/2.6.1= 6-rc5/ Now I've used some clever subject lines on the subsequent patches to make you think this isn't a big deal. Actually there are about 36 other "prep" patches before those, and PageReserved removal before that (which are luckily now mostly in -mm or -linus, respectively). What's more, there aren't 3 lockless pagecache patches, there are 5 -- but the last two are optimisations. I'm writing some stuff about these patches, and I've uploaded a **draft** chapter on the RCU radix-tree, 'radix-intro.pdf' in above directory (note the bibliography didn't make it -- but thanks Paul McKenney!) If anyone would like to test or review it, I would be very happy. Suggestions to the code or document would be very welcome... but I'm still hoping nobody spots a fundamental flaw until after OLS. Rollup of prep patches (5 posted patches apply to the top of this): 2.6.16-rc5-git14-prep.patch.gz Rollup of prep+lockless patches (includes the 5 posted patches): 2.6.16-rc5-git14-lockless.patch.gz Note: anyone interested in benchmarking should test prep+rollup vs prep rather than vs mainline if possible, because there are various other optimisations in prep. Thanks, Nick -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to maj...@kv.... For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=3Dmailto:"do...@kv..."> em...@kv... </a> |
From: <asb...@if...> - 2006-03-05 21:50:51
|
Okay, This is my first try at throwing out, and restoring an anonymous page. What it does: Stores 10 pages in memory that would have been kicked out. Getting a stored page back on a fault. BUGS The patch is ridden with white spaces after I removed my debugging code, I will clean this up, but I havn't had the time yet. This is after all just a look at how things could be designed. Mvh, Asbj=F8rn Sannes |
From: ace <ace...@at...> - 2006-03-02 20:40:39
|
Nitin Gupta wrote: > Hi, > <snip> > * Shrinking ccache: > > A heuristic will be developed (not yet thought out) that will tell if > compressed cache is really worth the RAM space it is occupying. Accordingly > ccache will be shrunk or expanded. > > > * Shrinking ccache is bit of a mess. > No problem, the first iteration could use a fixed size ccache, defined via a kernel parameter, or better, via a tunable /proc/sys/vm entry. Peter |
From: Nitin G. <nit...@gm...> - 2006-03-02 10:51:41
|
Asbjørn Sannes wrote: > Nitin Gupta wrote: > >> I've been working on Compressed Cache structure for last few days. >> Following is the brain dump of all the details required for ccache >> structure implementation. This doc is a *headache* to follow (poorly >> written, no diagrams included) -- but I'll later sketch them if >> required for any part. >> > > <snip> > > >> When shrinking, see pages belonging to which inode's mapping are >> mostly being freed (using simple hooks in shrink_list()-vmscan.c). >> This shows that a particular inode's (file's) pages are not being used. >> So, when shrinking, search the radix tree for compressed pages (search >> based on PAGECACHE_TAG_COMPRESSED tag) and free all the associated >> chunks. Since a page can only be freed when all the chunks contained >> within are freed, relocate the remaining chunks in the page to free >> chunks in other pages and finally free the page. >> >> > I'm not sure I follow, at least I need some clarification, when do you > know a page is empty (so that it can be freed) or if you choose a page, > how do you know what compressed page a chunk belongs to? > > PS! I might be confused since I have been looking at storing anonymous > memory into the compressed area. > > Mvh, > Asbjørn Sannes > > > > You asked: I) when do you know a page is empty (so that it can be freed)? II) if you choose a page, how do you know what compressed page a chunk belongs to? For (I): When a chunk is freed, foll. occurs: 1) It is merged with adjacent free blocks. 2) Check if this merged free chunk spans one or more pages. 3) If chunk spans one or more pages, free those pages. Foll. gives a rough code for this: // two helper functions (func1, func2) // round-off an address to it page's base(start) address func1 (chunk's start_addr) { base_page_addr = chunk->start_addr & PAGE_MASK; return base_page_addr; } // get struct page from page's base(start) address func2 (base addr of page where chunk is located) { pfn = (base_page_addr - PAGE_OFFSET) >> PAGE_OFFSET; struct page *page = mem_map + pfn; return page; } -- let chunk=final merged chunk obtained after step (1) -- if chunk->size < PAGE_SIZE then skip (2) and (3) -- while (chunk->size >= PAGE_SIZE) { calc base_page_addr for this chunk (using func1); // delta is space used by chunk in page where it is present delta = base_page_addr + PAGE_SIZE - chunk->start_addr; // consider cases of whether chunk starts at page boundary or not if (chunk->start_addr & PAGE_MASK) { // chunk starts at page boundary // page is 'struct page' of the page where chunk starts from (using func2) chunk->start_addr += PAGE_SIZE; free_page(page); } else { if (chunk->size >= delta + PAGE_SIZE) { Create New Chunk with start addr as chunk->start_addr and size=delta; Add above created chunk free list; chunk->start_addr += delta+PAGE_SIZE; free_page(page->next); // 'page' is as obtained in above case (using func2) } else break; } chunk->size -= PAGE_SIZE; } // end of while For (II): (if you choose a page, how do you know what compressed page a chunk belongs to?) You can't do that. If you need it, we can simply add index and mapping fields of compressed page to struct chunk. But, this will make big metadata overhead. Basically, I could not see requirement to know which compressed page a chunk belongs to in any of ccache operations (insert, delete, shrink. expand). Cheers, Nitin |
From: <asb...@if...> - 2006-03-01 18:50:56
|
Nitin Gupta wrote: > I've been working on Compressed Cache structure for last few days. > Following is the brain dump of all the details required for ccache > structure implementation. This doc is a *headache* to follow (poorly > written, no diagrams included) -- but I'll later sketch them if > required for any part. <snip> > When shrinking, see pages belonging to which inode's mapping are > mostly being freed (using simple hooks in shrink_list()-vmscan.c). > This shows that a particular inode's (file's) pages are not being used= . > So, when shrinking, search the radix tree for compressed pages (search > based on PAGECACHE_TAG_COMPRESSED tag) and free all the associated > chunks. Since a page can only be freed when all the chunks contained > within are freed, relocate the remaining chunks in the page to free > chunks in other pages and finally free the page. > I'm not sure I follow, at least I need some clarification, when do you know a page is empty (so that it can be freed) or if you choose a page, how do you know what compressed page a chunk belongs to? PS! I might be confused since I have been looking at storing anonymous memory into the compressed area. Mvh, Asbj=F8rn Sannes |
From: Nitin G. <nit...@gm...> - 2006-03-01 16:09:32
|
Hi, I've been working on Compressed Cache structure for last few days. Followin= g is the brain dump of all the details required for ccache structure implementation. This doc is a *headache* to follow (poorly written, no diagrams included) -- but I'll later sketch them if required for any part. I intend to start its implementation by next week, so please post any doubts, suggestion, ideas you have :) This is basically same as the structure described in that 'Initial Design' doc but now I have thought it in more detail suitable for implementation. First, the goals of 'good' ccache structure: -- Locating a page in ccache should be fast -- Adding/Taking a page to/from ccache should be fast (apart from compression/decompression time) -- Minimize fragmentation -- Avoid double lookups i.e. single lookup of page cache should give information if page is in compressed cache or not and also tell location of the page within ccache if it is compressed. -- Minimize metadata overhead -- Page allocation requests should be 0 order (0 order allocation are faste= r than higher order and almost never fail) -- Shrinking/Expanding ccache should be fast (Currently, focus is on handling page cache (file-backed) pages only. Since I have doubts over swap cache handling, I'll take it later. Compressed cach= e for Page Cache and Swap Cache will be separately maintained but they both will have same ccache structure.) Now following gives description of the structure which, I think, is 'nearly good'. In end there is a list of problems that I can currently see with thi= s structure. 1. Overview (Please see page cache mgmt. using radix trees and about 'tags' - UTLK 3rd ed. is great). Also, I'll assume that you've read that 'initial design' doc so I don't have to retype it here :) Foll. gives details I left there: The ccache will have 'chunked' structure (please see 'initial design' doc for its overview) -- i.e. each compressed page can be stored in several different blocks of memory that may not be physically contiguous. For every page that is decided to be compressed, its node in page cache radix tree will be replaced by 'struct chunk_head *' instead of a page descriptor pointer (struct page *). 'chunk_head' points to list of chunks storing parts of this compressed page= : struct chunk_head { unsigned int nr_chunks; unsigned int private; // stores flags, usage count struct chunk *chunk_list; } 'struct chunk' needs to store staring location of chunk and its size: struct chunk { void *start_addr; unsigned int size; // MSB for 'used' flag struct list_head next; struct list_head next_chunk; } All the chunks in ccache form a doubly linked list using 'next_chunk'. 'next' is for connecting 'related' chunks separately like free chunk list, chunks belonging to same compressed page. 2. Cache Operations * Extracting a page from ccache: System looks in page cache (radix tree of a particular inode), finds that PAGECACHE_TAG_COMPRESSED tag is set. So, the node doesn't have a (struct page *) but a chunk_head pointer (chunk_head *). Now, chunk_head->chunk_lis= t gives address of first 'struct chunk'. This gives start_addr and size of th= e compressed page within this chunk. Remaining chunks can be reached using chunk->next. Now, 'chunks' assocaiated with page just extracted are freed. Checks if neighboring chunks are free (these can be reached using chunk->next_chunk->{prev, next}). If neighbouring chunks are also free (their MSB in chunk->size is cleared) they are merged into a single chunk. Finally, extraneous 'struct chunks' are freed and the ultimate 'struct chunk' obtained after merging adjacent chunks is added to a 'free list' (al= l free chunks are linked together -- that's free list). * Inserting a page in ccache: I. Compress the page and store in chunks taken from 'free list'. II. Update the radix tree entry corresponding to this page to now point to 'chunk_head' used for this page. III. Also update chunk->start_addr, chunk->size of the last chunk according to space left there. * Shrinking ccache: A heuristic will be developed (not yet thought out) that will tell if compressed cache is really worth the RAM space it is occupying. Accordingly ccache will be shrunk or expanded. When shrinking, see pages belonging to which inode's mapping are mostly being freed (using simple hooks in shrink_list()-vmscan.c). This shows that a particular inode's (file's) pages are not being used. So, when shrinking, search the radix tree for compressed pages (search base= d on PAGECACHE_TAG_COMPRESSED tag) and free all the associated chunks. Since = a page can only be freed when all the chunks contained within are freed, relocate the remaining chunks in the page to free chunks in other pages and finally free the page. * Expanding ccache: Simply allocate a new page and add it to free list. 3. Notes * This chunked structure basically reduces (hopefully) fragmentation to a large extent (I've no data to support this) since it allows a single compressed page to be stored scattered over many chunks. * Inserting a page is fast since storage blocks(chunks) are simply taken from free list (and allocating more pages if required). This does not involve touching any other live data within ccache. * Expanding ccache is piece of cake! * Shrinking ccache is bit of a mess. 4. Problems * Metadata size problem: For e.g., a compressed page taking 3 chunks requires sizeof(chunk_head) + 3*sizeof(chunk) amount of metadata (On a 32-bit system this is 36 bytes!!) * When extracting a compressed page, data is read from several chunks which can be on separate physical pages. This total destruction of locality of reference can seriously affect performance. It's a bad idea that cannot be changed. -- Unknown I'll be glad to have any suggestions, ideas you have :) Cheers, Nitin |
From: <asb...@if...> - 2006-03-01 13:58:28
|
Hi, I have an idea for storing compressed pages, which I think could prove to work out for us, it is based on previous work and some of my own tinkering, which of course means it needs some attention.. :) This proposal has some blanks in it, but they are all easily addressed (I think, but then again, I could be wrong...) The main idea is to have locality within zones which we can use to resize the cache. It works like this: A local zone contains pages, and each of the pages contains chunks, a structure representing a compressed page points to the first chunk which in turn points to the next. The first page allocated for a local zone includes the zone header which contains information about what pages are available and what chunks are in use. We have a radix tree, say cc_pages which we use to lookup pages by their id which point to a structure (struct cc_page) which explains to us how to retrieve the page (and any other information we might need). The cc_page tells us which chunk will be the first one, and each chunk has a header before its data that includes the next chunk. cc_page should probably be allocated using the slab, as we create and destroy these all the time. struct cc_page { struct cc_chunk_header *first; struct cc_local_zone *zone; int bytes_in_last_chunk; } cc_local_zone_header stored in the first page of the zone. struct cc_local_zone_header { int freechunks; struct cc_page *compressed_pages[MAX_COMPRESSED_PAGES_IN_ZONE]; void *zone_pages[MAX_PAGES_IN_ZONE]; bitmap_of_somesort chunks; // Have to lookup some of this } cc_chunk_header stored in the beginning of the chunk. struct cc_chunk_header { struct chunk_header *next; } Insert: Find a local zone with enough room, if not kick out pages LRU order until a zone with enough space is available. Delete: Mark the chunks used as free in the local_zone bitmap and update number of chunks free. To make it easy to find a zone with enough space, we could (if there doesn't exist a data structure that implements this already) have an array[roughly PAGE_SIZE/CHUNK_SIZE] of doubly linked list of zones, whe= re every zone with PAGE_SIZE_CHUNK_SIZE or more pages available is in the last slot and ordered down by chunks available .. this way we have a fast lookup to find a zone with enough space .. Another thing is that zones can be lazily allocated, by this, I mean we don't have to allocate the number of pages a zone can be right away, but rather wait until a compressed page has used up all chunks in the pages allocated thus far. For the sizes of the various things, there isn't really a good answer to that yet as far as I can tell, we probably have to do some tests to figure that out.. I probably forgot some details, and there probably are issues, so please feel free to comment on this! Mvh, Asbj=F8rn Sannes |
From: Nitin G. <nit...@gm...> - 2006-02-25 06:48:38
|
Hi, Just CVS comitted some more ccache work (for 2.6.16-rc4). It just copies (no compression) data to another page and replaces the original page with this copied page in the inode radix tree (page cache) -- basically a step closer to real thing :) Soon, I'll try to implement the 'chunked' scheme (as desc. in that 'Initial design' doc). Cheers, Nitin |
From: Nitin G. <nit...@gm...> - 2006-02-21 20:19:10
|
Hi, These are few lines I added to 2.6.16-rc4 while going through vmm code. Just some printks for nice /var/log/messages display :) Posting it here just to highlight some entry points. Exams are just over -- ccache time again :) Cheers, Nitin |
From: <fla...@gm...> - 2006-02-07 23:37:38
|
On 2/7/06, Nitin Gupta <nit...@gm...> wrote: > Hi All, > Very recently upgraded to 2.6.16-rc2 :) It has many changes > all over the VMM -- some are just cosmetic changes (e.g a large code > block moved into a separate function) while some others require more > thought (what the hell CONFIG_MIGRATION stands for; can we ignore it > for ccache stuff?). Wow, don't you read LWN? ;-) It most certainly has to do with the "page migration for NUMA systems": http://lwn.net/Articles/157066/ http://lwn.net/Articles/160201/ I guess you shouldn't worry about it anytime soon... Best regards, Fl=E1vio |
From: Nitin G. <nit...@gm...> - 2006-02-07 23:19:26
|
Hi All, Very recently upgraded to 2.6.16-rc2 :) It has many changes all over the VMM -- some are just cosmetic changes (e.g a large code block moved into a separate function) while some others require more thought (what the hell CONFIG_MIGRATION stands for; can we ignore it for ccache stuff?). The initial code I committed in CVSROOT/linux26 is exceptionally broken and since 2.6.16-rc2 has many changes, I'll rewrite a simplified version of the idea I mentioned in that 'Initial Design' doc to get better understanding of the changes introduced in this new kernel. Also, my exams are now a just a few days away and ccache work is overloading me! These will be over by 20th Feb; so till then I will not be able to work on this project. After these I'll be able to work freely on this project -- till next exams wave :) Finally, if you have any ideas for ccache, please do post them. Currently I have no clear way of handling swap cache pages. Of course, previous implementation methods exists together with some other solutions that I've rough ideas for -- as I mentioned in that 'Initial design' doc -- but these all have problems as I mentioned in that doc. Till then I'll keep on the work to handle page cache pages only and later add support for swap cache pages. The project is in very early stages with many exciting ideas (and very little code ;) so it will be great if you join the development too :) Cheers, Nitin |
From: <asb...@if...> - 2006-01-30 11:24:44
|
Nitin Gupta wrote: > Hi All, > > Just took some days off from compressed cache. Back to work now :) > > Welcome back then :) > Also, current implementation is going towards 'chunked' structure of > ccache. It almost seems like a joke that without any simulations or > detailed study I'm so inclined to implement it instead of earlier > '2-page cell structure' which was so thoroughly studied. > I think that after the initial implementation it shouldn't be too > difficult to implement both (or maybe others) of these storage schemes > for proper benchmarks. > I read a paper "Adaptive Main Memory Compression" by Irina Chihaia and Thomas Gross, where they implement something they call "zones" to handle memory. In short, what they do is the same as you proposed earlier (linking fragments of compressed pages), but with an added locality of a zone, so that you can increase and decrease the size of the cache by adding or removing a zone (all fragments must be within one zone, that can be comprised of more than one page). Just to comment on their approach, they focus a lot on large datasets applications and only them, IMHO we can do as well or better on large datasets and still don't throw away performance for smaller ones.. > Please post any ideas you have for ccache. > I have not been to much help as far, as I'm still reading up on everything I need (and writing a summary of it), but I will try to get some of my ideas out there :). Mvh, Asbj=F8rn Sannes |
From: Nitin G. <nit...@gm...> - 2006-01-30 10:35:38
|
SGkgQWxsLAoKSnVzdCB0b29rIHNvbWUgZGF5cyBvZmYgZnJvbSBjb21wcmVzc2VkIGNhY2hlLiBC YWNrIHRvIHdvcmsgbm93IDopCgpQcmVzZW50bHkgSSd2ZSBqdXN0IGNvZGVkIGEgJ2JyYWluIGR1 bXAnIGltcGxlbWVudGF0aW9uCihDVlNST09UL2xpbnV4MjYpIJYgY29tcGlsZXMgY2xlYW5seSBi dXQgY3Jhc2hlcyBhcyBzb29uIGFzIHN3YXBwaW5nCmJlZ2lucy4KCkl0IGp1c3QgdHVybmVkIG91 dCB0byBiZSB2ZXJ5IGRpZmZpY3VsdCB0byB1c2UgcGFnZSBjYWNoZSByYWRpeCB0cmVlCnRoZW1z ZWx2ZXMgZm9yIHN0b3JpbmcgbG9jYXRpb24gaW5mb3JtYXRpb24gZm9yIGNvbXByZXNzZWQgcGFn ZXMuIE5vdywKcHJpbWFyeSBhaW0gd291bGQgYmUgdG8gZ2V0IHRoaXMgcGFnZSBjYWNoZSBoYW5k bGluZyBzY2hlbWUgd29ya2luZywKYW5kIHRoZW4gaXQgc2hvdWxkIGJlIGVhc3kgdG8gYWN0dWFs bHkgY29tcHJlc3MvZGVjb21wcmVzcyBwYWdlcy4KCkFsc28sIGN1cnJlbnQgaW1wbGVtZW50YXRp b24gaXMgZ29pbmcgdG93YXJkcyAnY2h1bmtlZCcgc3RydWN0dXJlIG9mCmNjYWNoZS4gSXQgYWxt b3N0IHNlZW1zIGxpa2UgYSBqb2tlIHRoYXQgd2l0aG91dCBhbnkgc2ltdWxhdGlvbnMgb3IKZGV0 YWlsZWQgc3R1ZHkgSSdtIHNvIGluY2xpbmVkIHRvIGltcGxlbWVudCBpdCBpbnN0ZWFkIG9mIGVh cmxpZXIKJzItcGFnZSBjZWxsIHN0cnVjdHVyZScgd2hpY2ggd2FzIHNvIHRob3JvdWdobHkgc3R1 ZGllZC4KSSB0aGluayB0aGF0IGFmdGVyIHRoZSBpbml0aWFsIGltcGxlbWVudGF0aW9uIGl0IHNo b3VsZG4ndCBiZSB0b28KZGlmZmljdWx0IHRvIGltcGxlbWVudCBib3RoIChvciBtYXliZSBvdGhl cnMpIG9mIHRoZXNlIHN0b3JhZ2Ugc2NoZW1lcwpmb3IgcHJvcGVyIGJlbmNobWFya3MuCgpBbHNv LCBJIHN0aWxsIGRvbid0IGhhdmUgYSB3YXkgdG8gaGFuZGxlIHN3YXAgY2FjaGUgcGFnZXMuIEFs dGhvdWdoCnRoZSBvYnZpb3VzIHdheSBvZiBqdXN0IHVzaW5nIHN3YXBwZXIgc3BhY2UgdHJlZSAo YW5hbG9nb3VzIHRvIHBhZ2UKY2FjaGUgaGFuZGxpbmcpIGV4aXN0cyBidXQgdGhpcyBoYXMgcHJv YmxlbXMgYXMgSSBtZW50aW9uZWQgaW4gdGhhdAonSW5pdGlhbCBkZXNpZ24nIGRvYy4KCkFzIG1l bnRpb25lZCBpbiAnSW5pdGl0YWwgZGVzaWduJyBkb2MsIHBhZ2UgY2FjaGUgYW5kIHN3YXAgY2Fj aGUgcGFnZXMKd2lsbCBiZSBtYWludGFpbmVkIGluIHNlcGFyYXRlIGNvbXByZXNzZWQgY2FjaGVz IChtYXliZSB3aXRoIGRpZmZlcmVudApzdHJ1Y3R1cmVzLCBoZXVyaXN0aWMpLCBzbyBpbXBsZW1l bnRhdGlvbiBmb3IgaGFuZGxpbmcgc3dhcCBjYWNoZSBhbmQKcGFnZSBjYWNoZSBwYWdlcyB3aWxs IGJlIHF1aXRlIGluZGVwZW5kZW50LgoKU28sIGltbWVkaWF0ZSBnb2FsIHdpbGwgYmUgaGFuZGxp bmcgcGFnZSBjYWNoZSBwYWdlcyAoY2xlYW4gYW5kIGRpcnR5KQpvbmx5IGFuZCB3b3JrIG9uIHN3 YXAgY2FjaGUgcGFnZXMgd2hlbiB3ZSBoYXZlIGEgcHJvcGVyIGRlc2lnbi4KClByZXNlbnRseSB0 aGVyZSBhcmUgbm8gI2lmZGVmIENPTkZJR19DQ0FDSEUghYWFLiAjZW5kaWYgIGJsb2NrcyCWIHdp bGwKdGhpbmsgYWJvdXQgdGhpcyB0aGluZyBsYXRlciCWIGZvciBub3cganVzdCB3b3JrIGZvciBh IHdvcmtpbmcgY29kZS4KQWxzbywgbG9va2luZyBmb3J3YXJkIHRvIHdoYXQncyBuZXcgaW4gdXBj b21pbmcgMi42LjE2IDopCgpQbGVhc2UgcG9zdCBhbnkgaWRlYXMgeW91IGhhdmUgZm9yIGNjYWNo ZS4KCi0tIE5pdGluCg== |
From: Nitin G. <nit...@gm...> - 2006-01-18 20:43:31
|
Hi Sannes, Great to see you being so interested in this project :) > I understand your are doing this for your master project?, since I'm > also doing this for my master (well, I haven't really gotten very > started, just read papers and looked at the VM code and CC for Linux > 2.4), I hope we can cooperate in the near future to make this a really > good implementation worthy of main-line inclusion (it would be nice, > don't you think?). Definitely, we can work together on this to have a really good implementati= on :) (2.6 kernels are already accused of being 'unstable' - ccache to mainline is highly improbable even if we come up with a dream implementation. But, I think, we can definitely come up with nice way of distributing it (as LiveCDs ?) to make it more popular if we really come up with a good implementation ;) > > I'm on jabber: as...@ja..., msn: ac...@sa... or the > #linux-mm-cc irc channel as sannes. > > I've skimmed your e-mails about design and such, I'm going to read it > more carefully soon. I'm on #linux-mm-cc as nitin. Presently, I'm tyring to get a bare-bones implemetation for what I have written (especially for handling page cache pages). Please post your comments on that design doc to lc-devel so that we can have discussion on it. It is unfortunate that mailing list archives show 'ccache - initial design' mail as empty message (though lc-devel members have received it correctly) -- this could otherwise have been available to non-members also and chance to get feedback from other developers too. > > My plan for the near future is to write a short essay (10 pages) about > CC past, present and future which is mandatory work for me, and after > that start coding and playing around with it to get a good grip on things= . > While researching on ccache, please post anything intersting you find to lc-devel so that we can discuss it. A general article on ccache would be nice too :) Also, it's really unfortunate that Rodrigo will not be working on this project. Although that would have been really helpful (and fun) but that should not stop us working on this with full spririts :) Nitin |
From: <asb...@if...> - 2006-01-17 10:14:20
|
Rodrigo S de Castro wrote: > Hi Nitin, Asbj=F8rn, and Bala, > > Recently you showed interest to develop the compressed cache for Linux= 2.6 > and I couldn't assure, at the time you joined the project, that I woul= d be > able to help, since I had to make a decision about some opportunities = I > had. > > Unfortunately the opportunity I chose for my professional future in th= e > near future will not allow me to help the development of the compresse= d > cache at this moment. However, I wish you good luck in its development= and > I will be available to help with any doubts about the 2.4 implementati= on. > > That is unfortunate, but I doubt it will dull the spirit of developing CC for 2.6 from my point of view :) Good luck with your career! :) Mvh, Asbj=F8rn Sannes |
From: Rodrigo S de C. <ro...@te...> - 2006-01-16 16:24:41
|
Hi Nitin, Asbj=F8rn, and Bala, Recently you showed interest to develop the compressed cache for Linux 2.6= =20 and I couldn't assure, at the time you joined the project, that I would be= =20 able to help, since I had to make a decision about some opportunities I=20 had. Unfortunately the opportunity I chose for my professional future in the=20 near future will not allow me to help the development of the compressed=20 cache at this moment. However, I wish you good luck in its development and= =20 I will be available to help with any doubts about the 2.4 implementation. Best regards, Rodrigo |
From: Nitin G. <nit...@gm...> - 2006-01-13 20:06:04
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> <p class="MsoNormal"><span style="font-size: 20pt;"><o:p></o:p>Compressed Cache<o:p></o:p></span></p> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"> <p class="MsoNormal" style="border: medium none ; padding: 0in;"><span style="font-size: 14pt;">Initial Design (for 2.6 kernels)<o:p></o:p></span></p> <br> <p class="MsoNormal" style="border: medium none ; padding: 0in;"><span style="font-size: 14pt;">Nitin Gupta<o:p></o:p></span></p> </div> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"> <p class="MsoNormal" style="border: medium none ; padding: 0in;">This document describes initial design ideas for the compressed cache for 2.6 kernels. Some of the ideas are based on the previous implementation and various other sources. Nothing has yet been implemented. We have decided to completely re-write previous implementation to have maximum flexibility to try out new ideas and take optimum advantage of many new features 2.6 kernels provide. Your feedback shall be really helpful.</p> </div> <p class="MsoNormal">The development will be based on vanilla kernels only.</p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><u><span style="font-size: 16pt;">General Background<o:p></o:p></span></u></p> <p class="MsoNormal"><o:p> </o:p><br> The system maintains two LRU lists – active and inactive LRU lists. These lists may contain both page-cache (file backed) and swap-cache (anonymous) pages. When under memory pressure, pages in inactive list are freed as:</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->swap-cache pages are written out to swap disks using swapper_space writepage() (swap_writepage()).</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->Dirty page-cache pages are flushed to filesystem disks using filesystem specific writepage().</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->Clean page-cache pages are simply freed.</p> <p class="MsoNormal"><o:p> </o:p><br> For compressed cache to be effective, it needs to store both swap-cache and page-cache (clean and dirty) pages. So, a way needed transparently (i.e. changes should be required within VMM subsystem itself) take these pages in/out of compressed cache.</p> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"> <p class="MsoNormal" style="border: medium none ; padding: 0in;"><o:p> </o:p></p> </div> <p class="MsoNormal"><o:p> </o:p><br> <span style="font-size: 14pt;">During Swap-Out</span></p> <p class="MsoNormal"><o:p> </o:p>shrink_cache() prepares a list of pages to be freed (these pages are from inactive list) and hands over this list to shrink_list() which then tries to free pages in the list, a page at-a-time handling swap-cache pages and (clean/dirty) page-cache pages as above.</p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><!--[if gte vml 1]><v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"> <v:stroke joinstyle="miter"/> <v:formulas> <v:f eqn="if lineDrawn pixelLineWidth 0"/> <v:f eqn="sum @0 1 0"/> <v:f eqn="sum 0 0 @1"/> <v:f eqn="prod @2 1 2"/> <v:f eqn="prod @3 21600 pixelWidth"/> <v:f eqn="prod @3 21600 pixelHeight"/> <v:f eqn="sum @0 0 1"/> <v:f eqn="prod @6 1 2"/> <v:f eqn="prod @7 21600 pixelWidth"/> <v:f eqn="sum @8 21600 0"/> <v:f eqn="prod @7 21600 pixelHeight"/> <v:f eqn="sum @10 21600 0"/> </v:formulas> <v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/> <o:lock v:ext="edit" aspectratio="t"/> </v:shapetype><v:shape id="_x0000_i1025" type="#_x0000_t75" style='width:567.75pt; height:255.75pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image001.png" o:title=""/> </v:shape><![endif]--><!--[if ! vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1025" height="341" width="757"><!--[endif]--></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><!--[if gte vml 1]><v:shape id="_x0000_i1026" type="#_x0000_t75" style='width:384pt;height:203.25pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image003.png" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1026" height="271" width="512"><!--[endif]--></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><span style="font-size: 14pt;">During Swap-In<o:p></o:p></span></p> <p class="MsoNormal"><b style=""><span style="font-size: 14pt;"><o:p></o:p></span></b><br> If page is anonymous, <b style="">do_swap_page()</b> looks-up swap cache first (using swp_entry_t stored in pte as key), if not in swap cache the required page is read in from swap disk (with some readahead), added to swap cache and the page is returned (mapped to process’ VMA).</p> <p class="MsoNormal"><o:p></o:p>For file-backed pages same logic applies (<b style="">do_file_page()</b>):</p> <p class="MsoNormal">If page is file backed, page cache is looked up first (using offset within file stored in pte as key), if not in page cache the required page is read in from filesystem disk (with some readahead), added to page cache and returned (mapped to process’ VMA).</p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><!--[if gte vml 1]><v:shape id="_x0000_i1027" type="#_x0000_t75" style='width:367.5pt;height:225pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image005.png" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1027" height="300" width="490"><!--[endif]--></p> <p class="MsoNormal"><o:p> </o:p></p> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"> <p class="MsoNormal" style="border: medium none ; padding: 0in;">What kind of pages have pt_none(*pte) true ?</p> <p class="MsoNormal" style="border: medium none ; padding: 0in;"><o:p> </o:p></p> </div> <p class="MsoNormal"><b style=""><span style="font-size: 14pt;"><o:p> </o:p></span></b></p> <p class="MsoNormal"><u><span style="font-size: 16pt;">Handling Page Cache pages<o:p></o:p></span></u></p> <p class="MsoNormal"><b style=""><o:p> </o:p></b></p> <p class="MsoNormal"><u>Taking page cache pages to compressed cache<o:p></o:p></u></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><!--[if gte vml 1]><v:shape id="_x0000_i1028" type="#_x0000_t75" style='width:416.25pt;height:249pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image007.png" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1028" height="332" width="555"><!--[endif]--></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">This is discussed in detail below.</p> <p class="MsoNormal">CCache heuristic is responsible to determine if adding the page to compressed cache is worth it. This is where compressed cache adaptivity behaviour will be implemented.</p> <p class="MsoNormal">Before I discuss how I intend to implement data structures for ccache lookup and storage, following gives a brief view of how page cache and swap cache are maintained in 2.6 kernels.</p> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"> <p class="MsoNormal" style="border: medium none ; padding: 0in;"><o:p> </o:p></p> </div> <p class="MsoNormal"><u><o:p><span style="text-decoration: none;"><br> </span></o:p></u></p> <p class="MsoNormal"><u>About Page Cache<o:p></o:p></u></p> <p class="MsoNormal" style="">Please see: <a href="http://www.linuxsymposium.org/2005/linuxsymposium_procv2.pdf">http://www.linuxsymposium.org/2005/linuxsymposium_procv2.pdf</a><span style=""> </span>-- paper ‘Examining Linux 2.6 Page-Cache Performance’.</p> <p class="MsoNormal">Each open file has a separate radix tree to maintain its pages in page cache. So, in effect, each open file has its own page cache. The offset within the file is used as key to locate the corresponding page in memory.</p> <p class="MsoNormal"><u>About Swap Cache<o:p></o:p></u></p> <p class="MsoNormal">All swap cache pages are part of a single swapper_space – a single radix tree maintains all pages in the swap cache. swp_entry_t is used as a key to locate the corresponding pages in memory.</p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><!--[if gte vml 1]><v:shape id="_x0000_i1029" type="#_x0000_t75" style='width:425.25pt;height:104.25pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image009.png" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1029" border="0" height="139" width="567"><!--[endif]--></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">type identifies things we can swap to.</p> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"> <p class="MsoNormal" style="border: medium none ; padding: 0in;"><o:p><br> </o:p></p> </div> <p class="MsoNormal"><b style="">Now, to handle page cache pages<o:p></o:p></b></p> <p class="MsoNormal">When page is to be flushed to disk (dirty) or just freed (clean), determine if it is to be added to compressed cache (ccache_heuristic()). If yes, then compress it (compress_page()) and add to compressed cache (add_to_ccache()).</p> <p class="MsoNormal">add_to_ccache() will not maintain separate data structures to maintain location of various pages in ccache. Instead it will modify the radix tree node to now point to a new struct page which will contain all the information needed to locate the page in ccache. This new struct page will also have a flag (PG_compressed – a new page flag we will define) set to identify that it does not point to a real page but is a container for information required to locate it in ccache (compare this with buffer heads – very similar thing).</p> <p class="MsoNormal">So, when a page is looked-up up in page cache, it is determined if it is ccache just by checking PG_compressed flag. In that case the struct page contains information to locate the corres. page in ccache. The page will be taken from the ccache, decompressed to a page (newly allocated), and radix tree node will again be made to point to this newly decompressed page.</p> <o:p> </o:p> <p class="MsoNormal"><!--[if gte vml 1]><v:shape id="_x0000_i1030" type="#_x0000_t75" style='width:645.75pt;height:297.75pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image011.png" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1030" border="0" height="397" width="861"><!--[endif]--></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><!--[if gte vml 1]><v:shape id="_x0000_i1031" type="#_x0000_t75" style='width:553.5pt;height:270.75pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image013.png" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1031" border="0" height="361" width="738"><!--[endif]--></p> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal">This approach eliminates the need for double lookups. If a page is not in page-cache, it need not be searched again (in whatever data structure that could have been used to maintain location information for ccache) to see if it is in ccache. We just need to check PG_compressed flag to determine if its in ccache. Also, if this flag is set, the corres. struct page now has all the info reqd to locate the page in ccache.</p> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"> <p class="MsoNormal" style="border: medium none ; padding: 0in;"><o:p> </o:p></p> </div> <p class="MsoNormal"><o:p><br> </o:p></p> <p class="MsoNormal"><u><span style="font-size: 16pt;">Handling Swap Cache pages<o:p></o:p></span></u></p> <p class="MsoNormal">We could have taken completely analogous approach to handle swap cache pages i.e.: </p> <p class="MsoNormal">When a swap cache page is to be moved to swap disk, mark it as PG_compressed, modify node as above to now point to a new struct page which contains all info to locate page in ccache. When page is again looked-up in swap cache during page fault, simply note if PG_compressed is set and decompress, clear PG_compressed, set node to again point to this decompressed page and return this page.</p> <p class="MsoNormal">But this has some problems:</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->If swap support has been disabled (!CONFIG_SWAP) there is no swap cache.</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->If we force CONFIG_SWAP to be always defined, then also we have problem. In this case, the key value for tree lookup (swp_entry_t) will used by both ccache and swap cache (if present). Thus we have problem of running out of swap addresses even if we have tones of swap disk space available (previous implementation used ‘virtual swap’ addressing but I have not studied it).</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->Entire swap handling code is a big heuristic that aims to minimize disk seeks – disk seeks are not our problem. If we depend on this swap code to provide us the key values(swp_entry_t) we’ll have to use existing code while we have totally different criteria of which pages to swap out, how much and which pages to read in during swap-in readahead etc.</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->If we implement our own algorithm to determine these key values in cases we want to move page to ccache, we will have collisions with key values generated by swap code. To avoid these collisions we can store location information for ccache (for swap cache) separately than swapper_space but that will force us to do double lookups – if a page is not in swap cache, look in ccache too – this is not good.</p> <br> <b style="">Solution<o:p></o:p></b> <p class="MsoNormal">I have no real solution yet. But what we want is this:</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->Keep ccache working even if no swap disk is present</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->Don’t allow algorithms implemented for swap disks affect which pages go to cccahe i.e. implement different algorithms to determine how many pages from inactive LRU to move to ccache, which pages to read in during swapin readahead etc.</p> <p class="MsoNormal" style="margin-left: 0.5in; text-indent: -0.25in;"><!--[if !supportLists]--><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span><!--[endif]-->Avoid double lookups i.e. – if page is not in swap cache check in ccache – this, I think, can always be avoided.</p> <p class="MsoNormal"><br> Following is a possible solution to all above problems. It might not even be a solution; I have not thought it out in detail:</p> <p class="MsoNormal">We have MAX_SWAPFILES (default 32) swap disks differentiated using ‘type’ field (5 bits) in swp_entry_t. Mark one (or some) of them as ‘virtual swap disks’ and assign it the highest priority of all swap disks. The swapper_space radix tree can then be used for ccache too (i.e. no double lookups). This will also allow us to switch swapping algorithms based on ‘kind’ of swap disk. We can then use rest of 27 bits as index into ccache.</p> <p class="MsoNormal">One initial problem is – we need <struct page, start offset, end offset> to locate a page in ccache and we have 27 bits only, so we can’t even store pointer to a struct containing all this info. </p> <p class="MsoNormal">This way, we’ll be sure that we never collide ccache keys with swap disk keys (‘segment’ no. in swap disk) in swapper_space radix tree since real and virtual swap disks use different ‘type’ field (and ‘swp_entry_t’ is a combination of ‘type’ and ‘offset’ fields).</p> <div style="border-style: none none solid; border-color: -moz-use-text-color -moz-use-text-color windowtext; border-width: medium medium 1pt; padding: 0in 0in 1pt;"><o:p> </o:p> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><u><span style="font-size: 16pt;">Compressed Cache Structure<o:p></o:p></span></u></p> <p class="MsoNormal">Previous implementation has a 2-page cell structure for compressed cache. Please see: <a href="http://linuxcompressed.sourceforge.net/files/docs/paper.pdf">http://linuxcompressed.sourceforge.net/files/docs/paper.pdf</a> for details about this structure.</p> <p class="MsoNormal">Although this structure is result of extensive study and practical experience but I’ve been working on an alternate design for quite some time. These are the main problems I see with 2-page cell structure.<br> <br> -- Need to search for cell with total free space big enough to hold compressed page. <br> -- Need to do compaction when a page is taken out.<br> -- Under heavy memory load it can be difficult to allocate two contiguous pages.<br> -- Lock contention problems (haven't studied this well).<br> <br> The idea is to let a compressed page be written in chunks in the free areas as they occur in ccache. <br> <br> It will eliminate need for searching for free space when compressed page is to be written to ccache, no need for compaction and allocates single page at a time. <br> <br> Following describes the idea:<br> <br> --> <b style="">During page-in</b> (taking page out of ccache):<br> <br> -- Locate all the chunks associated with required page and decompress them.<br> -- Merge free chunks if found adjacent. Maintain a list of free chunks as <struct page, offset, len> (possibly maintain different list according to chunk sizes e.g. 'good'>=1/2*PAGE_SIZE, 'bad'>=1/4*PAGE_SIZE and the 'ugly' for chunks < bad).<br> -- If a "good" chunk was just freed, set this as 'start' position for write (writing compressed page in ccache). This will prevent searching free list every time from beginning which will otherwise cause fragmentation problem very severe. <br> <br> --> <b style="">During page-out</b> (storing page into ccache):<br> <br> -- Compress page and start writing out beginning with 'start' position (this is set during page-in and is at start of ccache initially).<br> -- Beginning with chunk at 'start', take more free chunks (without considering their size) from free list to write out compressed page. <br> -- To prevent "metadata madness" limit the no. of chunks per compressed page (say, MAX_CHUNKS=4).<br> -- If no. of chunks required > MAX_CHUNKS, simply write out the last chunk at 'tail' of ccache.<br> -- Fix-up the free list to reflect chunks used and space left over after the last chunk. <br> <br> This can (i'm not sure) also reduce lock contentions as locks can now be more fine grained.<br> <br> Though it causes space overhead due to need to maintain info about all the chunks (upto MAX_CHUNKS) associated with a compressed page, it should significantly improve speed -- it will eliminate need for searching for free space when compressed page is to be written to ccache, no need for compaction and allocates single page at a time. <br> </p> <p class="MsoNormal">Following shows a sample execution of this algorithm:</p> <p class="MsoNormal"><!--[if gte vml 1]><v:shape id="_x0000_i1032" type="#_x0000_t75" style='width:642pt;height:555.75pt'> <v:imagedata src="file:///C:\DOCUME~1\Nitin\LOCALS~1\Temp\msohtml1\01\clip_image015.png" o:title=""/> </v:shape><![endif]--><!--[if !vml]--><img src="cid:par...@gm..." v:shapes="_x0000_i1032" border="0" height="741" width="856"><!--[endif]--></p> <u><span style="font-size: 16pt;"><o:p><span style="text-decoration: none;"></span></o:p></span></u><u><span style="font-size: 16pt;"><br> <br> General Remarks<o:p></o:p></span><br> </u><br> <p class="MsoNormal" style="">There are some more things to consider. Some of these were also considered during previous implementation but I am not sure of the solutions:</p> <p class="MsoNormal" style=""><o:p> </o:p><br> -- Send pages to swap disk (when compressed cache is ‘full’ – reached some maximum size) as compressed or uncompressed? Previous implementation had option to send compressed pages to swap disk as is (with null-padding to align to page boundary in swap). This delayed decompressing page until really required. Goal of sending page to swap compressed is not to have space gains (null-padding has been done) but to avoid unnecessary overhead.</p> <p class="MsoNormal" style="">-- As described in this document, the compressed cache will be maintained separately for swap cache and page cache pages. This has the benefit of simplified design, reduced lock contention and should be also useful to have a more effective heuristic which can now treat swap and page cache pages separately.</p> <p class="MsoNormal" style="">-- Also mentioned in this document that pages will be placed (compressed and chunked-out) in ccache asynchronously.</p> <p class="MsoNormal" style="">-- I am not clear on how to allow compression algorithm to be changed at runtime.</p> <p class="MsoNormal" style="">-- Can compressed cache also have uncompressed pages (for pages that are uncompressible)?</p> <p class="MsoNormal" style="">-- Compression algorithm? Wk4x4 / LZO / ?</p> <p class="MsoNormal" style="">-- Implementing above ccache structure (if we really decide that it’s actually good) will be hard. Implementing 2-page cell structure is not trivial either. What structure do you think we should begin implementing?</p> <p class="MsoNormal" style="">I think for initial implementation, we can try handling only page cache pages (as it turns out that we have confusion over how to handle swap cache pages). Since we’re maintaining compressed cache for page cache and swap cache separately, this initial implementation will not cause deviation from the design.</p> <br> </body> </html> |
From: Nitin G. <nit...@gm...> - 2006-01-01 18:40:21
|
Hi Rodrigo, I had a careful look at VMM code (2.6.14.3) for last few days and it seems to be somewhat complex. Properly understanding it will take more time than = I expected. I think a static compressed cache handling only swap-cache and using the same 2-page cell structure will be nice "Ver 0.1". After this initial implementation will be ready to try out all the new ideas we have. Basically I need some experience with VMM code to really come up with something we can say "Compressed Cache for 2.6 kernels". I have many new ideas for ccache but cannot determine if they are crap or practical to implement and can really help overall system performance. After this initial implementation, I think we can gradually revamp it to include adaptivity, handling page-cache pages, and other things of concern. I have also been working on another ccache structure for some time now. These are the main problems I see with 2-page cell structure. -- Need to search for cell with total free space big enough to hold compressed page. -- Need to do compaction when a page is taken out. -- Under heavy memory load it can be difficult to allocate two contiguous pages. -- Lock contention problems (haven't studied this well). The idea is to let a compressed page be written in chunks in the free areas as they occur in ccache. The ideas and the algorithm are taken partly from- 1) the way linux handles in-flight block I/O - single buffer is scattered across many pages in variable sized chunks. It can help in (similar) implementation of ccache structure. 2) log-based filesystems - It searches for ideas to minimize disk seeks during disk write and minimize fragmentation. We are not concerned about disk seeks but ideas can help avoid traversing free lists and particularly good are fragmentation avoidance methods. 3) buddy allocator - Merge together adjacent free blocks and maintain separate lists according to 'order' of pages. It will eliminate need for searching for free space when compressed page is to be written to ccache, no need for compaction and allocates single page a= t a time. Following describes the idea: --> During page-in (taking page out of ccache): -- Locate all the chunks associated with required page and decompress them. -- Merge free chunks if found adjacent. Maintain a list of free chunks as <struct page, offset, len> (possibly maintain different list according to chunk sizes e.g. 'good'>=3D1/2*PAGE_SIZE, 'bad'>=3D1/4*PAGE_SIZE and the 'u= gly' for chunks < bad). -- If a "good" chunk was just freed, set this as 'start' position for write (writing compressed page in ccache). This will prevent searching free list every time from beginning which will otherwise cause fragmentation problem very severe. --> During page-out (storing page into ccache): -- Compress page and start writing out beginning with 'start' position (thi= s is set during page-in and is at start of ccache initially). -- Beginning with chunk at 'start', take more free chunks (without considering their size) from free list to write out compressed page. -- To prevent "metadata madness" limit the no. of chunks per compressed pag= e (say, MAX_CHUNKS=3D4). -- If no. of chunks required > MAX_CHUNKS, simply write out the last chunk at 'tail' of ccache. -- Fix-up the free list to reflect chunks used and space left over after th= e last chunk. This can (i'm not sure) also reduce lock contentions as locks can now be more fine grained. Though it causes space overhead due to need to maintain info about all the chunks (upto MAX_CHUNKS) associated with a compressed page, it should significantly improve speed (reasons mentioned above). Also, I have some more interesting papers that can help in ccache but as I mentioned in beginning, my first objective is to get working a static ccache, handling only swap-cache pages and using 2-page cell structure so that I first have some experience with VMM code. Happy New Year!! Best Regards, Nitin |
From: Rodrigo S de C. <ro...@te...> - 2005-12-21 17:05:11
|
Hi Nitin, On Wednesday 21 December 2005 13:43, you wrote: > The exams are now nearly over and I can now work full time on > Compressed Cache :). Thanks for adding me as a developer in this > project! Good news! Welcome as a new developer :-) I still don't have news about the chances to get back to the project, and I also ended up starting a new project in the last weeks that is very time consuming. But I still intend to help you as soon as start coding. > I've just started looking at VMM code and your previous > implementation. Now, I am working on collecting all the things done up > to now and making up a general workflow for this work (I'll post it > very soon). Great! > Also, I have very little experience working with CVS and it appears to > be a complex tool. So, it can take me some time to get working > smoothly with online repository. So, also learning CVS to get some > hang of it. Sorry, I can see here some trouble for you :). CVS is easy. Most of time you will use update and commit only. Check CVSBook (google it) for more information about development with cvs. > While reading your code, I just noticed that you have also made > changes outside mm/ like in fs/inode.c, fs/smbfs/dir.c and some others > in fs/. Did the implementation required changes to other kernel parts > which had to do something with caches? I don't even know what is being > done but just thought it would be nightmare to maintain it if it > requires changes all over the kernel. Maybe, I haven't understood your > idea there. I think, compressed cache should be completely transparent > to rest to kernel subsystems. Some kernel parts perform part of memory management (in particular eviction or flushing path) and we need to change them. That is particularly true for a compressed cache that stores any page from page cache (and not only swap backed ones). I took a brief look at the code and it doesn't seem to have many changes though, basicaly only to flush given compressed pages from the compressed cache. I can't remember at this moment why I did it, but probably there is a very sensible reason based on the implemented architecture. > Also, do you think we need to work out a way to actually store huge > pages in compressed cache. ? I currently only have a little intuitive > understanding of impact of compressed cache on other caches but I > think, after some initial development we will be able to take benefit > of many things 2.6 kernels offer. Since the user must reserve a space prior to using huge pages, it doesn't seem we have to worry much about it. Maybe we could try to shrink the cache if the user requests more huge pages or have some other policy to avoid introducing impact for applications that use huge pages. As Dilma mentioned, I also had the feeling that huge page support is in its infancy yet, but maybe in the future a better support for them would make us rethink about our support in compressed cache. Best regards, -- Rodrigo S. de Castro ro...@te... 8301-5944 |
From: Nitin G. <nit...@gm...> - 2005-12-21 15:44:07
|
Hi Rodrigo, The exams are now nearly over and I can now work full time on Compressed Cache :). Thanks for adding me as a developer in this project! I've just started looking at VMM code and your previous implementation. Now, I am working on collecting all the things done up to now and making up a general workflow for this work (I'll post it very soon). Also, I have very little experience working with CVS and it appears to be a complex tool. So, it can take me some time to get working smoothly with online repository. So, also learning CVS to get some hang of it. Sorry, I can see here some trouble for you :). While reading your code, I just noticed that you have also made changes outside mm/ like in fs/inode.c, fs/smbfs/dir.c and some others in fs/. Did the implementation required changes to other kernel parts which had to do something with caches? I don't even know what is being done but just thought it would be nightmare to maintain it if it requires changes all over the kernel. Maybe, I haven't understood your idea there. I think, compressed cache should be completely transparent to rest to kernel subsystems. Also, do you think we need to work out a way to actually store huge pages in compressed cache. ? I currently only have a little intuitive understanding of impact of compressed cache on other caches but I think, after some initial development we will be able to take benefit of many things 2.6 kernels offer. Best Regards, Nitin |
From: Nitin G. <nit...@gm...> - 2005-11-30 17:46:24
|
Hi Rodrigo, As mail from Dilma DaSilva suggests, the huge pages support can undergo many changes in (near) future. So, apart from studying impact of compressed cache on applications using huge pages, we should also work out cache structure for actually storing these 4M pages. Although we may not implement it now but it will be useful to have a way to handle them without affecting existing cell structure too much so that it is easier to incorporate it later if need be. Before I talked to wli (yes, he was William Irwin (wli), as in the maintainers file :) ), I thought of a simple way of handling 4M pages: Simply store them separately and also maintain the corresponding metadata separately. When a huge page comes to be swapped out, compress it page-by-page, allocating no. of single (4K) pages as required. When another huge page comes, start by allocating a new page and do not touch space left over from storing previous huge page i.e. let the space in last page be wasted (I think, this will not be that costly and will greatly simplify design). When a huge page is to be taken out, simple decompress it and free the associated pages in compressed cache. One immediate problem I could see here is that while compressing huge page, pages have to be allocated on-demand. If a huge page is being swapped out, the system must be really tight on memory and while it is not completely compressed it just cannot be freed. But once it is completely compressed system will not only gain with saved space but also from larger contiguous space availability. This "design" also eliminates need for compaction and is completely separated from cache for normal 4K pages. This is just an initial idea. I haven't though it out in detail but just wanted to tell you this before I go invisible for few weeks :) Best Regards, Nitin On 11/30/05, Rodrigo S de Castro <ro...@te...> wrote: > Hi Nitin, > > On Monday 28 November 2005 18:36, Nitin Gupta wrote: > > Hi Rodrigo, > > We need not think about handling 4M pages. I fortunately had a talk > > with HugeTLB Filesystem maintainer himself at #kernelnewbies! This is > > what he told: Huge pages are never paged out. They are just paged in > > on demand (lazy mmap()'ing). > > Great that you talked to him. Is the maintaner William Irwin (wli), as in= the > maintainers file? He is very well-known in kernel world. > > If huge pages are not page out, we must find out the impact on them by us= ing a > portion of the memory for compressed cache. In the first implementations, > when I was only compressing swap pages, the impact on the other page cach= e > pages was severe. Also on buffers, that ended up being written much more > often. We must understand his implementation better to know if there is a > chance that we introduce a negative impact on applications that may use h= uge > pages. > > > Also, 2.6 kernels have a thrashing protection algorithm implemented > > (mm/thrash.c) (based on http://www.cs.wm.edu/~sjiang/token.pdf). This > > should significantly improve compressed cache performance as it will > > reduce no. of "false LRU pages" (as paper calls them) reaching > > compressed cache and hence lesser no. of unnecessary > > compression/decompressions. > > Oh, I read this paper. Sometime ago, I thought about implementing it afte= r Rik > van Riel mentioned that to me, but unfortunately things changed and I > couldn't spend more time on kernel development at that time. > > I agree that it may help compressed cache. However, perhaps some changes = to > the amount of time that an application holds the swap token may happen > depending on the compressed cache size and on how this value is initially > defined in the implementation (I didn't read the code yet). > > > Just when everything was ready - latest vanilla kernel with all > > debugging turned on, my End-Term exams have struck. They are now > > dangerously close (3 Dec). So, I am sorry to tell you that I may not > > be able to work on it till 25 Dec. > > First finish all your exams, then you will be free to work intensively :-= ) > Good luck with them! > > > So, I think CVS branch can then be started for real work :) > > (my sf user: nitin_sf) > > You are now a developer in the SF project. > > > Anyway, I will try reading more of existing code and keep up with > > relevant kernel updates and keep you informed. > > Excellent. I will try to find more about huge pages and swap token impact= and > will let you know too. > > Best Regards > > > On 11/27/05, Rodrigo S de Castro <ro...@te...> wrote: > > > Hi Nitin, > > > > > > I am very sorry for the delay. I had restricted internet access this = week > > > and couldn't answer your email earlier. > > > > > > If we are going to do a new implementation, I agree with the static > > > compressed cache approach. This was the first step of the current 2.4 > > > implementation and I think it was very useful. Actually, even if we w= ere > > > to port the current code, I think that disabling adaptivity would be = very > > > useful to make it work stably initially and then we add and improve t= he > > > adaptivity heuristic. > > > > > > Concerning the kernel version, I would go with the latest vanilla ver= sion > > > and would update for every release, since I don't think that it would > > > have many differences in -ac/-mm branches that could justify the effo= rt > > > of keeping updated with them. In the past, during 2.4 version > > > implementation, we achieved a point that some people asked to keep > > > updated with -ac branch, but it was already somewhat stable and peopl= e > > > were using it. Therefore, I think your approach would work. > > > > > > This week I discussed a little bit about compressed cache in Linux 2.= 6 > > > and there is something that we must think when implementing this new > > > version. In 2.6 there is support for huge pages (4Mb pages instead of > > > 4Kb, at least on i386). We didn't have that in 2.4 and we must devise= a > > > way to handle them. > > > > > > Finally, great to hear about taking compressed cache as your final > > > university project. I really hope we can work together on this. > > > > > > PS: Tell me your sourceforge user so I can add as a developer in the > > > project if you are about to start coding. Let's manage the CVS reposi= tory > > > with directories or branches for each implementation (2.4/2.6). > > > > > > Best regards, > > > > > > Rodrigo > > > > > > -----Original Message----- > > > From: Nitin Gupta [mailto:nit...@gm...] > > > Sent: ter=E7a-feira, 22 de novembro de 2005 08:36 > > > To: Rodrigo S de Castro > > > Cc: lin...@li... > > > Subject: Re: Compressed cache status > > > > > > Hi Rodrigo, > > > > > > I think the coding can now start from next week. > > > > > > For the compressed structure, I think the 2-page cell structure would > > > be best as you have done a lot of research here and my guess-work wil= l > > > not help here. > > > > > > For the implementation, I think first getting static compressed cache > > > implemented for 2.6 kernel would be very useful. I think, by actually > > > working on VM code will give us a clear understanding of all the > > > relevant recent kernel changes. Although you have a lot of experience > > > working on VM code but for me, it would be very difficult to > > > practically come-up with a good heuristic now and go on with > > > implementing a dynamic compressed cache. Also, this will allow us to > > > parallelize the work on heuristic without delaying the work. > > > > > > Also, I think working on a new implementation instead of just trying > > > to port your previous work would be better since older bugs may becom= e > > > very hard to trace down and this will also give more flexibility in > > > implementation. > > > > > > To start with the implementation should we start with the latest > > > vanilla kernel or -ac/-mm? > > > In my previous work I went with the latest vanilla and updated it > > > (about) every 3 weeks to keep the work current. Do you think this > > > approach will work here? > > > > > > I have also taken-up this as my final year university project and as > > > an entry for a competition sponsored by Red Hat > > > (http://ekalavya.iitb.ac.in/rhs/ - Group RHS051032). So, it will be > > > great to have you as mentor! > > > > > > > > > Best Regards, > > > > > > Nitin > > > > > > On 11/20/05, Rodrigo S de Castro <ro...@te...> wrote: > > > > Hi Nitin, > > > > > > > > As I mentioned, there is a possibility that I return to compressed > > > > cache development. In the next two or three weeks I will have a > > > > decision about this. > > > > > > > > Concerning compressed cache structure, the fragmentation is in the > > > > cells > > > > > > (ie > > > > > > > contiguous pages), and some metadata keep track of the fragmented s= pace > > > > in the cell. Of course there is a small memory space overhead to ke= ep > > > > this > > > > > > data > > > > > > > and a performance penalty, but it is negligible given the benefit w= e > > > > have (some benchmark results would back this statement). What is > > > > interesting in this metadata is that we don't have much overhead wi= th > > > > the fragmentation > > > > > > and > > > > > > > the compaction. If it is worth to compact a cell rather than alloca= te a > > > > > > new > > > > > > > one or evict a compressed page, we do it after a quick hash table > > > > lookup. Moving data is expensive, but it is less expensive than > > > > eviction or a new page allocation (remember we are always under mem= ory > > > > pressure). > > > > > > > > In case of too much fragmentation in the compressed cache (and we d= on't > > > > > > have > > > > > > > any single cell whose compaction would result in enough free space)= , a > > > > scheme to perform compaction moving compressed pages from a cell to > > > > > > another > > > > > > > could improve a lot the allocated memory space, but I am not sure t= he > > > > cost would be affordable. Moreover, I am not sure this is a common > > > > case, > > > > > > anyway, > > > > > > > but it is very simple to keep track of the overall free space to ha= ve a > > > > feeling of it after some tests. > > > > > > > > Your ideas for the use of compressed cache are very interesting. I > > > > would > > > > > > say > > > > > > > that its usage in PDA/Cell phones is also something that may be > > > > > > interesting > > > > > > > (even though, in this case, performance improvement may not be the > > > > primary goal). > > > > > > > > I went through the 2.6 memory code recently, but very briefly. I ha= d > > > > the impression that it does not have substantial changes in the > > > > eviction path, except the inclusion of rmap. We could figure out ho= w > > > > knowing the > > > > > > processing > > > > > > > mapping a page could help us. Another research topic would be check= ing > > > > kVMTrace developed by Scott Kaplan. Maybe it could help us improvin= g > > > > the heuristic to avoid maladaptivity, but still I don't know much a= bout > > > > it to advance our discussion at this moment. > > > > > > > > As a general guideline, I would suggest that you take into > > > > consideration > > > > > > the > > > > > > > design decisions I took in this work. They are very important and a= re > > > > all the lessons I learned during the implementation and my research= : > > > > > > > > - Cells composed of 2 contiguous pages > > > > - Page cache (not only swap cache) > > > > - Heuristic to try to disable compressed cache when it is not usefu= l > > > > - Virtual swap address > > > > - Page order > > > > - Possibility of swap compression > > > > - Idea behind profit and cost lists in the heuristic > > > > - Scheduling impact > > > > - Buffer/IO operations impact > > > > - Adjustment of Linux watermarks > > > > > > > > Having all these points in mind will let us to start a port or new > > > > implementation far ahead. In summary, the main lesson is that the > > > > system > > > > > > is > > > > > > > very much sensitive to any changes in it (mainly reduction of memor= y > > > > for other caches). > > > > > > > > Let's keep in touch. As soon as I have a decision about the possibi= lity > > > > of working with you on this, I will let you know. > > > > > > > > Best regards, > > > > > > > > Rodrigo > > > > > > > > -----Original Message----- > > > > From: Nitin Gupta [mailto:nit...@gm...] > > > > Sent: quarta-feira, 16 de novembro de 2005 12:47 > > > > To: ro...@te... > > > > Cc: lin...@li... > > > > Subject: Compressed cache status > > > > > > > > Hi Rodrigo, > > > > > > > > It was great to see your reply! > > > > I've been working on compressed cache port to 2.6 for about 2 > > > > months whenever I can take out the time. Currently not a single lin= e > > > > of code has been written. Since I was new to VMM subsystem, the > > > > initial study took too much time. The details seemed overwhelming b= ut > > > > after a lot of work the picture is much clearer now. > > > > So, the current status is that I have read a lot on VMM and you= r > > > > paper on adaptive compressed caching. Now, I'm looking for the most > > > > recent changes in kernel that can help compressed cache design. > > > > Since, I've done some kernel patches (for CIFS vfs) before, whi= ch > > > > involved lot of concurrency issues, I have a general feel of kernel > > > > programming. So, I think once a complete "roadmap" is prepared, wor= k > > > > can really speed up. > > > > > > > > So, before I go hunting for the right kernel hooks, I wanted to > > > > have following on paper: > > > > > > > > - Compressed cache structure - currently the cell based design (2-4 > > > > pages taken as unit) has problem of fragmentation and relies on > > > > compaction as last resort to free-up memory before starting to > > > > swap-out to disks. Although, fragmentation cannot be eliminated but= I > > > > think a different design (for which I am looking for) can surely > > > > reduce it. > > > > Also, it needs to be addressable storage (kernel asks for p= age > > > > in swap at xxxxx where is this in compressed cache?). Although you = have > > > > implemented this but I couldn't get it. > > > > > > > > - Adaptivity heuristic - I've read your current heuristic method bu= t > > > > as you mentioned that rmap and other recent developments in 2.6 can= be > > > > really helpful here. So, I'm now reading more on recent development= s. > > > > Heuristic quality will be very significant and can determine succes= s > > > > of this work. As you have shown setting up just the right static ca= che > > > > size for a particular workload is so good for performance while the > > > > wrong choice produces significant slowdown. > > > > > > > > - Compression algorithm - these can simply be reused from your > > > > implementation. > > > > > > > > With these in hand, I can then proceed with implementation: find th= e > > > > right kernel hooks, implement a "dummy" compressed cache (not reall= y > > > > do anything but just to make sure I've the right entry points), and > > > > then gradually implement all the parts (from static to dynamic cach= e). > > > > > > > > Also, I'm particularly looking forward to its use in: > > > > - LiveCD environment - swap in not available and CD-ROM read speed = is > > > > exceptionally slow. > > > > - Virtualized environment - hard-disk is a file backed on host OS. = So, > > > > RAM and disc speed gap is even greater here (many more +ve factors > > > > here). > > > > > > > > In general i'm not very busy and can spare a lot of time for this > > > > work. But, unfortunately my semester exams are due to start early > > > > december and last for about 3 weeks. However, I'm extremely interes= ted > > > > in this project and it would be great if I can get your help. > > > > > > > > > > > > Best regards, > > > > > > > > Nitin > > |
From: Dilma D. <di...@wa...> - 2005-11-30 14:05:02
|
In my opinion, the implementation for proper support of large pages (in Linux and other operating systems) is in its infancy; in order to allow applications to really take advantage of large pages (e.g. the reduced number TLB entries) the support in Linux will have to advance (and in my opinion it will, as developers see the need for it). I think it makes sense to take large pages into consideration as the design for compressed caches is revisited now; I don't suggest to address it on an implementation now, but to think about it enough so that the many issues (such as the relevance of memory fragmentation, the potential need to promote and demote memory areas from/to large/small pages) can be worked out later if needed. dilma |
From: Rodrigo S de C. <ro...@te...> - 2005-11-30 11:45:34
|
Hi Nitin, On Monday 28 November 2005 18:36, Nitin Gupta wrote: > Hi Rodrigo, > We need not think about handling 4M pages. I fortunately had a talk > with HugeTLB Filesystem maintainer himself at #kernelnewbies! This is > what he told: Huge pages are never paged out. They are just paged in > on demand (lazy mmap()'ing). Great that you talked to him. Is the maintaner William Irwin (wli), as in t= he=20 maintainers file? He is very well-known in kernel world. If huge pages are not page out, we must find out the impact on them by usin= g a=20 portion of the memory for compressed cache. In the first implementations,=20 when I was only compressing swap pages, the impact on the other page cache= =20 pages was severe. Also on buffers, that ended up being written much more=20 often. We must understand his implementation better to know if there is a=20 chance that we introduce a negative impact on applications that may use hug= e=20 pages. > Also, 2.6 kernels have a thrashing protection algorithm implemented > (mm/thrash.c) (based on http://www.cs.wm.edu/~sjiang/token.pdf). This > should significantly improve compressed cache performance as it will > reduce no. of "false LRU pages" (as paper calls them) reaching > compressed cache and hence lesser no. of unnecessary > compression/decompressions. Oh, I read this paper. Sometime ago, I thought about implementing it after = Rik=20 van Riel mentioned that to me, but unfortunately things changed and I=20 couldn't spend more time on kernel development at that time. I agree that it may help compressed cache. However, perhaps some changes to= =20 the amount of time that an application holds the swap token may happen=20 depending on the compressed cache size and on how this value is initially=20 defined in the implementation (I didn't read the code yet). > Just when everything was ready - latest vanilla kernel with all > debugging turned on, my End-Term exams have struck. They are now > dangerously close (3 Dec). So, I am sorry to tell you that I may not > be able to work on it till 25 Dec. =46irst finish all your exams, then you will be free to work intensively :-= )=20 Good luck with them! > So, I think CVS branch can then be started for real work :) > (my sf user: nitin_sf) You are now a developer in the SF project. > Anyway, I will try reading more of existing code and keep up with > relevant kernel updates and keep you informed. Excellent. I will try to find more about huge pages and swap token impact a= nd=20 will let you know too. Best Regards > On 11/27/05, Rodrigo S de Castro <ro...@te...> wrote: > > Hi Nitin, > > > > I am very sorry for the delay. I had restricted internet access this we= ek > > and couldn't answer your email earlier. > > > > If we are going to do a new implementation, I agree with the static > > compressed cache approach. This was the first step of the current 2.4 > > implementation and I think it was very useful. Actually, even if we were > > to port the current code, I think that disabling adaptivity would be ve= ry > > useful to make it work stably initially and then we add and improve the > > adaptivity heuristic. > > > > Concerning the kernel version, I would go with the latest vanilla versi= on > > and would update for every release, since I don't think that it would > > have many differences in -ac/-mm branches that could justify the effort > > of keeping updated with them. In the past, during 2.4 version > > implementation, we achieved a point that some people asked to keep > > updated with -ac branch, but it was already somewhat stable and people > > were using it. Therefore, I think your approach would work. > > > > This week I discussed a little bit about compressed cache in Linux 2.6 > > and there is something that we must think when implementing this new > > version. In 2.6 there is support for huge pages (4Mb pages instead of > > 4Kb, at least on i386). We didn't have that in 2.4 and we must devise a > > way to handle them. > > > > Finally, great to hear about taking compressed cache as your final > > university project. I really hope we can work together on this. > > > > PS: Tell me your sourceforge user so I can add as a developer in the > > project if you are about to start coding. Let's manage the CVS reposito= ry > > with directories or branches for each implementation (2.4/2.6). > > > > Best regards, > > > > Rodrigo > > > > -----Original Message----- > > From: Nitin Gupta [mailto:nit...@gm...] > > Sent: ter=E7a-feira, 22 de novembro de 2005 08:36 > > To: Rodrigo S de Castro > > Cc: lin...@li... > > Subject: Re: Compressed cache status > > > > Hi Rodrigo, > > > > I think the coding can now start from next week. > > > > For the compressed structure, I think the 2-page cell structure would > > be best as you have done a lot of research here and my guess-work will > > not help here. > > > > For the implementation, I think first getting static compressed cache > > implemented for 2.6 kernel would be very useful. I think, by actually > > working on VM code will give us a clear understanding of all the > > relevant recent kernel changes. Although you have a lot of experience > > working on VM code but for me, it would be very difficult to > > practically come-up with a good heuristic now and go on with > > implementing a dynamic compressed cache. Also, this will allow us to > > parallelize the work on heuristic without delaying the work. > > > > Also, I think working on a new implementation instead of just trying > > to port your previous work would be better since older bugs may become > > very hard to trace down and this will also give more flexibility in > > implementation. > > > > To start with the implementation should we start with the latest > > vanilla kernel or -ac/-mm? > > In my previous work I went with the latest vanilla and updated it > > (about) every 3 weeks to keep the work current. Do you think this > > approach will work here? > > > > I have also taken-up this as my final year university project and as > > an entry for a competition sponsored by Red Hat > > (http://ekalavya.iitb.ac.in/rhs/ - Group RHS051032). So, it will be > > great to have you as mentor! > > > > > > Best Regards, > > > > Nitin > > > > On 11/20/05, Rodrigo S de Castro <ro...@te...> wrote: > > > Hi Nitin, > > > > > > As I mentioned, there is a possibility that I return to compressed > > > cache development. In the next two or three weeks I will have a > > > decision about this. > > > > > > Concerning compressed cache structure, the fragmentation is in the > > > cells > > > > (ie > > > > > contiguous pages), and some metadata keep track of the fragmented spa= ce > > > in the cell. Of course there is a small memory space overhead to keep > > > this > > > > data > > > > > and a performance penalty, but it is negligible given the benefit we > > > have (some benchmark results would back this statement). What is > > > interesting in this metadata is that we don't have much overhead with > > > the fragmentation > > > > and > > > > > the compaction. If it is worth to compact a cell rather than allocate= a > > > > new > > > > > one or evict a compressed page, we do it after a quick hash table > > > lookup. Moving data is expensive, but it is less expensive than > > > eviction or a new page allocation (remember we are always under memory > > > pressure). > > > > > > In case of too much fragmentation in the compressed cache (and we don= 't > > > > have > > > > > any single cell whose compaction would result in enough free space), a > > > scheme to perform compaction moving compressed pages from a cell to > > > > another > > > > > could improve a lot the allocated memory space, but I am not sure the > > > cost would be affordable. Moreover, I am not sure this is a common > > > case, > > > > anyway, > > > > > but it is very simple to keep track of the overall free space to have= a > > > feeling of it after some tests. > > > > > > Your ideas for the use of compressed cache are very interesting. I > > > would > > > > say > > > > > that its usage in PDA/Cell phones is also something that may be > > > > interesting > > > > > (even though, in this case, performance improvement may not be the > > > primary goal). > > > > > > I went through the 2.6 memory code recently, but very briefly. I had > > > the impression that it does not have substantial changes in the > > > eviction path, except the inclusion of rmap. We could figure out how > > > knowing the > > > > processing > > > > > mapping a page could help us. Another research topic would be checking > > > kVMTrace developed by Scott Kaplan. Maybe it could help us improving > > > the heuristic to avoid maladaptivity, but still I don't know much abo= ut > > > it to advance our discussion at this moment. > > > > > > As a general guideline, I would suggest that you take into > > > consideration > > > > the > > > > > design decisions I took in this work. They are very important and are > > > all the lessons I learned during the implementation and my research: > > > > > > - Cells composed of 2 contiguous pages > > > - Page cache (not only swap cache) > > > - Heuristic to try to disable compressed cache when it is not useful > > > - Virtual swap address > > > - Page order > > > - Possibility of swap compression > > > - Idea behind profit and cost lists in the heuristic > > > - Scheduling impact > > > - Buffer/IO operations impact > > > - Adjustment of Linux watermarks > > > > > > Having all these points in mind will let us to start a port or new > > > implementation far ahead. In summary, the main lesson is that the > > > system > > > > is > > > > > very much sensitive to any changes in it (mainly reduction of memory > > > for other caches). > > > > > > Let's keep in touch. As soon as I have a decision about the possibili= ty > > > of working with you on this, I will let you know. > > > > > > Best regards, > > > > > > Rodrigo > > > > > > -----Original Message----- > > > From: Nitin Gupta [mailto:nit...@gm...] > > > Sent: quarta-feira, 16 de novembro de 2005 12:47 > > > To: ro...@te... > > > Cc: lin...@li... > > > Subject: Compressed cache status > > > > > > Hi Rodrigo, > > > > > > It was great to see your reply! > > > I've been working on compressed cache port to 2.6 for about 2 > > > months whenever I can take out the time. Currently not a single line > > > of code has been written. Since I was new to VMM subsystem, the > > > initial study took too much time. The details seemed overwhelming but > > > after a lot of work the picture is much clearer now. > > > So, the current status is that I have read a lot on VMM and your > > > paper on adaptive compressed caching. Now, I'm looking for the most > > > recent changes in kernel that can help compressed cache design. > > > Since, I've done some kernel patches (for CIFS vfs) before, which > > > involved lot of concurrency issues, I have a general feel of kernel > > > programming. So, I think once a complete "roadmap" is prepared, work > > > can really speed up. > > > > > > So, before I go hunting for the right kernel hooks, I wanted to > > > have following on paper: > > > > > > - Compressed cache structure - currently the cell based design (2-4 > > > pages taken as unit) has problem of fragmentation and relies on > > > compaction as last resort to free-up memory before starting to > > > swap-out to disks. Although, fragmentation cannot be eliminated but I > > > think a different design (for which I am looking for) can surely > > > reduce it. > > > Also, it needs to be addressable storage (kernel asks for page > > > in swap at xxxxx where is this in compressed cache?). Although you ha= ve > > > implemented this but I couldn't get it. > > > > > > - Adaptivity heuristic - I've read your current heuristic method but > > > as you mentioned that rmap and other recent developments in 2.6 can be > > > really helpful here. So, I'm now reading more on recent developments. > > > Heuristic quality will be very significant and can determine success > > > of this work. As you have shown setting up just the right static cache > > > size for a particular workload is so good for performance while the > > > wrong choice produces significant slowdown. > > > > > > - Compression algorithm - these can simply be reused from your > > > implementation. > > > > > > With these in hand, I can then proceed with implementation: find the > > > right kernel hooks, implement a "dummy" compressed cache (not really > > > do anything but just to make sure I've the right entry points), and > > > then gradually implement all the parts (from static to dynamic cache). > > > > > > Also, I'm particularly looking forward to its use in: > > > - LiveCD environment - swap in not available and CD-ROM read speed is > > > exceptionally slow. > > > - Virtualized environment - hard-disk is a file backed on host OS. So, > > > RAM and disc speed gap is even greater here (many more +ve factors > > > here). > > > > > > In general i'm not very busy and can spare a lot of time for this > > > work. But, unfortunately my semester exams are due to start early > > > december and last for about 3 weeks. However, I'm extremely interested > > > in this project and it would be great if I can get your help. > > > > > > > > > Best regards, > > > > > > Nitin |
From: Nitin G. <nit...@gm...> - 2005-11-28 20:36:31
|
Hi Rodrigo, We need not think about handling 4M pages. I fortunately had a talk with HugeTLB Filesystem maintainer himself at #kernelnewbies! This is what he told: Huge pages are never paged out. They are just paged in on demand (lazy mmap()'ing). Also, 2.6 kernels have a thrashing protection algorithm implemented (mm/thrash.c) (based on http://www.cs.wm.edu/~sjiang/token.pdf). This should significantly improve compressed cache performance as it will reduce no. of "false LRU pages" (as paper calls them) reaching compressed cache and hence lesser no. of unnecessary compression/decompressions. Just when everything was ready - latest vanilla kernel with all debugging turned on, my End-Term exams have struck. They are now dangerously close (3 Dec). So, I am sorry to tell you that I may not be able to work on it till 25 Dec. So, I think CVS branch can then be started for real work :) (my sf user: nitin_sf) Anyway, I will try reading more of existing code and keep up with relevant kernel updates and keep you informed. Best Regards, Nitin On 11/27/05, Rodrigo S de Castro <ro...@te...> wrote: > Hi Nitin, > > I am very sorry for the delay. I had restricted internet access this week > and couldn't answer your email earlier. > > If we are going to do a new implementation, I agree with the static > compressed cache approach. This was the first step of the current 2.4 > implementation and I think it was very useful. Actually, even if we were = to > port the current code, I think that disabling adaptivity would be very > useful to make it work stably initially and then we add and improve the > adaptivity heuristic. > > Concerning the kernel version, I would go with the latest vanilla version > and would update for every release, since I don't think that it would hav= e > many differences in -ac/-mm branches that could justify the effort of > keeping updated with them. In the past, during 2.4 version implementation= , > we achieved a point that some people asked to keep updated with -ac branc= h, > but it was already somewhat stable and people were using it. Therefore, I > think your approach would work. > > This week I discussed a little bit about compressed cache in Linux 2.6 an= d > there is something that we must think when implementing this new version.= In > 2.6 there is support for huge pages (4Mb pages instead of 4Kb, at least o= n > i386). We didn't have that in 2.4 and we must devise a way to handle them= . > > Finally, great to hear about taking compressed cache as your final > university project. I really hope we can work together on this. > > PS: Tell me your sourceforge user so I can add as a developer in the proj= ect > if you are about to start coding. Let's manage the CVS repository with > directories or branches for each implementation (2.4/2.6). > > Best regards, > > Rodrigo > > -----Original Message----- > From: Nitin Gupta [mailto:nit...@gm...] > Sent: ter=E7a-feira, 22 de novembro de 2005 08:36 > To: Rodrigo S de Castro > Cc: lin...@li... > Subject: Re: Compressed cache status > > Hi Rodrigo, > > I think the coding can now start from next week. > > For the compressed structure, I think the 2-page cell structure would > be best as you have done a lot of research here and my guess-work will > not help here. > > For the implementation, I think first getting static compressed cache > implemented for 2.6 kernel would be very useful. I think, by actually > working on VM code will give us a clear understanding of all the > relevant recent kernel changes. Although you have a lot of experience > working on VM code but for me, it would be very difficult to > practically come-up with a good heuristic now and go on with > implementing a dynamic compressed cache. Also, this will allow us to > parallelize the work on heuristic without delaying the work. > > Also, I think working on a new implementation instead of just trying > to port your previous work would be better since older bugs may become > very hard to trace down and this will also give more flexibility in > implementation. > > To start with the implementation should we start with the latest > vanilla kernel or -ac/-mm? > In my previous work I went with the latest vanilla and updated it > (about) every 3 weeks to keep the work current. Do you think this > approach will work here? > > I have also taken-up this as my final year university project and as > an entry for a competition sponsored by Red Hat > (http://ekalavya.iitb.ac.in/rhs/ - Group RHS051032). So, it will be > great to have you as mentor! > > > Best Regards, > > Nitin > > > > On 11/20/05, Rodrigo S de Castro <ro...@te...> wrote: > > Hi Nitin, > > > > As I mentioned, there is a possibility that I return to compressed cach= e > > development. In the next two or three weeks I will have a decision abou= t > > this. > > > > Concerning compressed cache structure, the fragmentation is in the cell= s > (ie > > contiguous pages), and some metadata keep track of the fragmented space= in > > the cell. Of course there is a small memory space overhead to keep this > data > > and a performance penalty, but it is negligible given the benefit we ha= ve > > (some benchmark results would back this statement). What is interesting= in > > this metadata is that we don't have much overhead with the fragmentatio= n > and > > the compaction. If it is worth to compact a cell rather than allocate a > new > > one or evict a compressed page, we do it after a quick hash table looku= p. > > Moving data is expensive, but it is less expensive than eviction or a n= ew > > page allocation (remember we are always under memory pressure). > > > > In case of too much fragmentation in the compressed cache (and we don't > have > > any single cell whose compaction would result in enough free space), a > > scheme to perform compaction moving compressed pages from a cell to > another > > could improve a lot the allocated memory space, but I am not sure the c= ost > > would be affordable. Moreover, I am not sure this is a common case, > anyway, > > but it is very simple to keep track of the overall free space to have a > > feeling of it after some tests. > > > > Your ideas for the use of compressed cache are very interesting. I woul= d > say > > that its usage in PDA/Cell phones is also something that may be > interesting > > (even though, in this case, performance improvement may not be the prim= ary > > goal). > > > > I went through the 2.6 memory code recently, but very briefly. I had th= e > > impression that it does not have substantial changes in the eviction pa= th, > > except the inclusion of rmap. We could figure out how knowing the > processing > > mapping a page could help us. Another research topic would be checking > > kVMTrace developed by Scott Kaplan. Maybe it could help us improving th= e > > heuristic to avoid maladaptivity, but still I don't know much about it = to > > advance our discussion at this moment. > > > > As a general guideline, I would suggest that you take into consideratio= n > the > > design decisions I took in this work. They are very important and are a= ll > > the lessons I learned during the implementation and my research: > > > > - Cells composed of 2 contiguous pages > > - Page cache (not only swap cache) > > - Heuristic to try to disable compressed cache when it is not useful > > - Virtual swap address > > - Page order > > - Possibility of swap compression > > - Idea behind profit and cost lists in the heuristic > > - Scheduling impact > > - Buffer/IO operations impact > > - Adjustment of Linux watermarks > > > > Having all these points in mind will let us to start a port or new > > implementation far ahead. In summary, the main lesson is that the syste= m > is > > very much sensitive to any changes in it (mainly reduction of memory fo= r > > other caches). > > > > Let's keep in touch. As soon as I have a decision about the possibility= of > > working with you on this, I will let you know. > > > > Best regards, > > > > Rodrigo > > > > -----Original Message----- > > From: Nitin Gupta [mailto:nit...@gm...] > > Sent: quarta-feira, 16 de novembro de 2005 12:47 > > To: ro...@te... > > Cc: lin...@li... > > Subject: Compressed cache status > > > > Hi Rodrigo, > > > > It was great to see your reply! > > I've been working on compressed cache port to 2.6 for about 2 > > months whenever I can take out the time. Currently not a single line > > of code has been written. Since I was new to VMM subsystem, the > > initial study took too much time. The details seemed overwhelming but > > after a lot of work the picture is much clearer now. > > So, the current status is that I have read a lot on VMM and your > > paper on adaptive compressed caching. Now, I'm looking for the most > > recent changes in kernel that can help compressed cache design. > > Since, I've done some kernel patches (for CIFS vfs) before, which > > involved lot of concurrency issues, I have a general feel of kernel > > programming. So, I think once a complete "roadmap" is prepared, work > > can really speed up. > > > > So, before I go hunting for the right kernel hooks, I wanted to > > have following on paper: > > > > - Compressed cache structure - currently the cell based design (2-4 > > pages taken as unit) has problem of fragmentation and relies on > > compaction as last resort to free-up memory before starting to > > swap-out to disks. Although, fragmentation cannot be eliminated but I > > think a different design (for which I am looking for) can surely > > reduce it. > > Also, it needs to be addressable storage (kernel asks for page = in > > swap at xxxxx where is this in compressed cache?). Although you have > > implemented this but I couldn't get it. > > > > - Adaptivity heuristic - I've read your current heuristic method but > > as you mentioned that rmap and other recent developments in 2.6 can be > > really helpful here. So, I'm now reading more on recent developments. > > Heuristic quality will be very significant and can determine success > > of this work. As you have shown setting up just the right static cache > > size for a particular workload is so good for performance while the > > wrong choice produces significant slowdown. > > > > - Compression algorithm - these can simply be reused from your > > implementation. > > > > With these in hand, I can then proceed with implementation: find the > > right kernel hooks, implement a "dummy" compressed cache (not really > > do anything but just to make sure I've the right entry points), and > > then gradually implement all the parts (from static to dynamic cache). > > > > Also, I'm particularly looking forward to its use in: > > - LiveCD environment - swap in not available and CD-ROM read speed is > > exceptionally slow. > > - Virtualized environment - hard-disk is a file backed on host OS. So, > > RAM and disc speed gap is even greater here (many more +ve factors > > here). > > > > In general i'm not very busy and can spare a lot of time for this > > work. But, unfortunately my semester exams are due to start early > > december and last for about 3 weeks. However, I'm extremely interested > > in this project and it would be great if I can get your help. > > > > > > Best regards, > > > > Nitin > > > > > > > > > > > > |