[lc-devel] Re: Compressed cache status

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Rodrigo,

As mail from Dilma DaSilva suggests, the huge pages support can
undergo many changes in (near) future. So, apart from studying impact
of compressed cache on applications using huge pages, we should also
work out cache structure for actually storing these 4M pages. Although
we may not implement it now but it will be useful to have a way to
handle them without affecting existing cell structure too much so that
it is easier to incorporate it later if need be.

Before I talked to wli (yes, he was William Irwin (wli), as in the
maintainers file :) ), I thought of a simple way of handling 4M pages:

Simply store them separately and also maintain the corresponding
metadata separately. When a huge page comes to be swapped out,
compress it page-by-page, allocating no. of single (4K) pages as
required. When another huge page comes, start by allocating a new page
and do not touch space left over from storing previous huge page i.e.
let the space in last page be wasted (I think, this will not be that
costly and will greatly simplify design).
When a huge page is to be taken out, simple decompress it and free the
associated pages in compressed cache.

One immediate problem I could see here is that while compressing huge
page, pages have to be allocated on-demand. If a huge page is being
swapped out, the system must be really tight on memory and while it is
not completely compressed it just cannot be freed. But once it is
completely compressed system will not only gain with saved space but
also from larger contiguous space availability. This "design" also
eliminates need for compaction and is completely separated from cache
for normal 4K pages.

This is just an initial idea. I haven't though it out in detail but
just wanted to tell you this before I go invisible for few weeks :)

Best Regards,

Nitin

On 11/30/05, Rodrigo S de Castro <ro...@te...> wrote:
> Hi Nitin,
>
> On Monday 28 November 2005 18:36, Nitin Gupta wrote:
> > Hi Rodrigo,
> > We need not think about handling 4M pages. I fortunately had a talk
> > with HugeTLB Filesystem maintainer himself at #kernelnewbies! This is
> > what he told: Huge pages are never paged out. They are just paged in
> > on demand (lazy mmap()'ing).
>
> Great that you talked to him. Is the maintaner William Irwin (wli), as in=
 the
> maintainers file? He is very well-known in kernel world.
>
> If huge pages are not page out, we must find out the impact on them by us=
ing a
> portion of the memory for compressed cache. In the first implementations,
> when I was only compressing swap pages, the impact on the other page cach=
e
> pages was severe. Also on buffers, that ended up being written much more
> often. We must understand his implementation better to know if there is a
> chance that we introduce a negative impact on applications that may use h=
uge
> pages.
>
> > Also, 2.6 kernels have a thrashing protection algorithm implemented
> > (mm/thrash.c) (based on http://www.cs.wm.edu/~sjiang/token.pdf). This
> > should significantly improve compressed cache performance as it will
> > reduce no. of "false LRU pages" (as paper calls them) reaching
> > compressed cache and hence lesser no. of unnecessary
> > compression/decompressions.
>
> Oh, I read this paper. Sometime ago, I thought about implementing it afte=
r Rik
> van Riel mentioned that to me, but unfortunately things changed and I
> couldn't spend more time on kernel development at that time.
>
> I agree that it may help compressed cache. However, perhaps some changes =
to
> the amount of time that an application holds the swap token may happen
> depending on the compressed cache size and on how this value is initially
> defined in the implementation (I didn't read the code yet).
>
> > Just when everything was ready - latest vanilla kernel with all
> > debugging turned on, my End-Term exams have struck. They are now
> > dangerously close (3 Dec). So, I am sorry to tell you that I may not
> > be able to work on it till 25 Dec.
>
> First finish all your exams, then you will be free to work intensively :-=
)
> Good luck with them!
>
> > So, I think CVS branch can then be started for real work :)
> > (my sf user: nitin_sf)
>
> You are now a developer in the SF project.
>
> > Anyway, I will try reading more of existing code and keep up with
> > relevant kernel updates and keep you informed.
>
> Excellent. I will try to find more about huge pages and swap token impact=
 and
> will let you know too.
>
> Best Regards
>
> > On 11/27/05, Rodrigo S de Castro <ro...@te...> wrote:
> > > Hi Nitin,
> > >
> > > I am very sorry for the delay. I had restricted internet access this =
week
> > > and couldn't answer your email earlier.
> > >
> > > If we are going to do a new implementation, I agree with the static
> > > compressed cache approach. This was the first step of the current 2.4
> > > implementation and I think it was very useful. Actually, even if we w=
ere
> > > to port the current code, I think that disabling adaptivity would be =
very
> > > useful to make it work stably initially and then we add and improve t=
he
> > > adaptivity heuristic.
> > >
> > > Concerning the kernel version, I would go with the latest vanilla ver=
sion
> > > and would update for every release, since I don't think that it would
> > > have many differences in -ac/-mm branches that could justify the effo=
rt
> > > of keeping updated with them. In the past, during 2.4 version
> > > implementation, we achieved a point that some people asked to keep
> > > updated with -ac branch, but it was already somewhat stable and peopl=
e
> > > were using it. Therefore, I think your approach would work.
> > >
> > > This week I discussed a little bit about compressed cache in Linux 2.=
6
> > > and there is something that we must think when implementing this new
> > > version. In 2.6 there is support for huge pages (4Mb pages instead of
> > > 4Kb, at least on i386). We didn't have that in 2.4 and we must devise=
 a
> > > way to handle them.
> > >
> > > Finally, great to hear about taking compressed cache as your final
> > > university project. I really hope we can work together on this.
> > >
> > > PS: Tell me your sourceforge user so I can add as a developer in the
> > > project if you are about to start coding. Let's manage the CVS reposi=
tory
> > > with directories or branches for each implementation (2.4/2.6).
> > >
> > > Best regards,
> > >
> > > Rodrigo
> > >
> > > -----Original Message-----
> > > From: Nitin Gupta [mailto:nit...@gm...]
> > > Sent: ter=E7a-feira, 22 de novembro de 2005 08:36
> > > To: Rodrigo S de Castro
> > > Cc: lin...@li...
> > > Subject: Re: Compressed cache status
> > >
> > > Hi Rodrigo,
> > >
> > > I think the coding can now start from next week.
> > >
> > > For the compressed structure, I think the 2-page cell structure would
> > > be best as you have done a lot of research here and my guess-work wil=
l
> > > not help here.
> > >
> > > For the implementation, I think first getting static compressed cache
> > > implemented for 2.6 kernel would be very useful. I think, by actually
> > > working on VM code will give us a clear understanding of all the
> > > relevant recent kernel changes. Although you have a lot of experience
> > > working on VM code but for me, it would be very difficult to
> > > practically come-up with a good heuristic now and go on with
> > > implementing a dynamic compressed cache. Also, this will allow us to
> > > parallelize the work on heuristic without delaying the work.
> > >
> > > Also, I think working on a new implementation instead of just trying
> > > to port your previous work would be better since older bugs may becom=
e
> > > very hard to trace down and this will also give more flexibility in
> > > implementation.
> > >
> > > To start with the implementation should we start with the latest
> > > vanilla kernel or -ac/-mm?
> > > In my previous work I went with the latest vanilla and updated it
> > > (about) every 3 weeks to keep the work current. Do you think this
> > > approach will work here?
> > >
> > > I have also taken-up this as my final year university project and as
> > > an entry for a competition sponsored by Red Hat
> > > (http://ekalavya.iitb.ac.in/rhs/ - Group RHS051032). So, it will be
> > > great to have you as mentor!
> > >
> > >
> > > Best Regards,
> > >
> > > Nitin
> > >
> > > On 11/20/05, Rodrigo S de Castro <ro...@te...> wrote:
> > > > Hi Nitin,
> > > >
> > > > As I mentioned, there is a possibility that I return to compressed
> > > > cache development. In the next two or three weeks I will have a
> > > > decision about this.
> > > >
> > > > Concerning compressed cache structure, the fragmentation is in the
> > > > cells
> > >
> > > (ie
> > >
> > > > contiguous pages), and some metadata keep track of the fragmented s=
pace
> > > > in the cell. Of course there is a small memory space overhead to ke=
ep
> > > > this
> > >
> > > data
> > >
> > > > and a performance penalty, but it is negligible given the benefit w=
e
> > > > have (some benchmark results would back this statement). What is
> > > > interesting in this metadata is that we don't have much overhead wi=
th
> > > > the fragmentation
> > >
> > > and
> > >
> > > > the compaction. If it is worth to compact a cell rather than alloca=
te a
> > >
> > > new
> > >
> > > > one or evict a compressed page, we do it after a quick hash table
> > > > lookup. Moving data is expensive, but it is less expensive than
> > > > eviction or a new page allocation (remember we are always under mem=
ory
> > > > pressure).
> > > >
> > > > In case of too much fragmentation in the compressed cache (and we d=
on't
> > >
> > > have
> > >
> > > > any single cell whose compaction would result in enough free space)=
, a
> > > > scheme to perform compaction moving compressed pages from a cell to
> > >
> > > another
> > >
> > > > could improve a lot the allocated memory space, but I am not sure t=
he
> > > > cost would be affordable. Moreover, I am not sure this is a common
> > > > case,
> > >
> > > anyway,
> > >
> > > > but it is very simple to keep track of the overall free space to ha=
ve a
> > > > feeling of it after some tests.
> > > >
> > > > Your ideas for the use of compressed cache are very interesting. I
> > > > would
> > >
> > > say
> > >
> > > > that its usage in PDA/Cell phones is also something that may be
> > >
> > > interesting
> > >
> > > > (even though, in this case, performance improvement may not be the
> > > > primary goal).
> > > >
> > > > I went through the 2.6 memory code recently, but very briefly. I ha=
d
> > > > the impression that it does not have substantial changes in the
> > > > eviction path, except the inclusion of rmap. We could figure out ho=
w
> > > > knowing the
> > >
> > > processing
> > >
> > > > mapping a page could help us. Another research topic would be check=
ing
> > > > kVMTrace developed by Scott Kaplan. Maybe it could help us improvin=
g
> > > > the heuristic to avoid maladaptivity, but still I don't know much a=
bout
> > > > it to advance our discussion at this moment.
> > > >
> > > > As a general guideline, I would suggest that you take into
> > > > consideration
> > >
> > > the
> > >
> > > > design decisions I took in this work. They are very important and a=
re
> > > > all the lessons I learned during the implementation and my research=
:
> > > >
> > > > - Cells composed of 2 contiguous pages
> > > > - Page cache (not only swap cache)
> > > > - Heuristic to try to disable compressed cache when it is not usefu=
l
> > > > - Virtual swap address
> > > > - Page order
> > > > - Possibility of swap compression
> > > > - Idea behind profit and cost lists in the heuristic
> > > > - Scheduling impact
> > > > - Buffer/IO operations impact
> > > > - Adjustment of Linux watermarks
> > > >
> > > > Having all these points in mind will let us to start a port or new
> > > > implementation far ahead. In summary, the main lesson is that the
> > > > system
> > >
> > > is
> > >
> > > > very much sensitive to any changes in it (mainly reduction of memor=
y
> > > > for other caches).
> > > >
> > > > Let's keep in touch. As soon as I have a decision about the possibi=
lity
> > > > of working with you on this, I will let you know.
> > > >
> > > > Best regards,
> > > >
> > > > Rodrigo
> > > >
> > > > -----Original Message-----
> > > > From: Nitin Gupta [mailto:nit...@gm...]
> > > > Sent: quarta-feira, 16 de novembro de 2005 12:47
> > > > To: ro...@te...
> > > > Cc: lin...@li...
> > > > Subject: Compressed cache status
> > > >
> > > > Hi Rodrigo,
> > > >
> > > >     It was great to see your reply!
> > > >     I've been working on compressed cache port to 2.6 for about 2
> > > > months whenever I can take out the time. Currently not a single lin=
e
> > > > of code has been written. Since I was new to VMM subsystem, the
> > > > initial study took too much time. The details seemed overwhelming b=
ut
> > > > after a lot of work the picture is much clearer now.
> > > >     So, the current status is that I have read a lot on VMM and you=
r
> > > > paper on adaptive compressed caching. Now, I'm looking for the most
> > > > recent changes in kernel that can help compressed cache design.
> > > >     Since, I've done some kernel patches (for CIFS vfs) before, whi=
ch
> > > > involved lot of concurrency issues, I have a general feel of kernel
> > > > programming. So, I think once a complete "roadmap" is prepared, wor=
k
> > > > can really speed up.
> > > >
> > > >     So, before I go hunting for the right kernel hooks, I wanted to
> > > > have following on paper:
> > > >
> > > > - Compressed cache structure - currently the cell based design (2-4
> > > > pages taken as unit) has problem of fragmentation and relies on
> > > > compaction as last resort to free-up memory before starting to
> > > > swap-out to disks. Although, fragmentation cannot be eliminated but=
 I
> > > > think a different design (for which I am looking for) can surely
> > > > reduce it.
> > > >         Also, it needs to be addressable storage (kernel asks for p=
age
> > > > in swap at xxxxx where is this in compressed cache?). Although you =
have
> > > > implemented this but I couldn't get it.
> > > >
> > > > - Adaptivity heuristic - I've read your current heuristic method bu=
t
> > > > as you mentioned that rmap and other recent developments in 2.6 can=
 be
> > > > really helpful here. So, I'm now reading more on recent development=
s.
> > > > Heuristic quality will be very significant and can determine succes=
s
> > > > of this work. As you have shown setting up just the right static ca=
che
> > > > size for a particular workload is so good for performance while the
> > > > wrong choice produces significant slowdown.
> > > >
> > > > - Compression algorithm - these can simply be reused from your
> > > > implementation.
> > > >
> > > > With these in hand, I can then proceed with implementation: find th=
e
> > > > right kernel hooks, implement a "dummy" compressed cache (not reall=
y
> > > > do anything but just to make sure I've the right entry points), and
> > > > then gradually implement all the parts (from static to dynamic cach=
e).
> > > >
> > > > Also, I'm particularly looking forward to its use in:
> > > > - LiveCD environment - swap in not available and CD-ROM read speed =
is
> > > > exceptionally slow.
> > > > - Virtualized environment - hard-disk is a file backed on host OS. =
So,
> > > > RAM and disc speed gap is even greater here (many more +ve factors
> > > > here).
> > > >
> > > > In general i'm not very busy and can spare a lot of time for this
> > > > work. But, unfortunately my semester exams are due to start early
> > > > december and last for about 3 weeks. However, I'm extremely interes=
ted
> > > > in this project and it would be great if I can get your help.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Nitin
>
>