[lc-devel] Contest Behaviour with Compressed Cache
Status: Beta
Brought to you by:
nitin_sf
From: Rodrigo S. de C. <rc...@im...> - 2002-09-28 18:54:16
|
[Con and Rik, it's a long email, but I'd like you to know some of my conclusions about contest and mem_load. Comments are welcome.] Contest [1] is a new benchmark by Con Kolivas that aims to test system responsiveness. It has been announced in the linux kernel mailing list and since then has attracted much attention from the kernel community. Two weeks ago, Con kindly sent me some results running his benchmark on a kernel with compressed cache. The first results were very good, hence I published them right away with the other statistics. However, after some bug fixes in contest, we noticed that compressed cache wasn't improving system performance under memory load test. As a matter of fact, the performance was worse than a vanilla kernel. Under IO load it improves and the other loads is indifferent to use compressed cache or not. First, let me tell briefly how contest works. It runs a kernel compilation (with -j4 concurrency level) under different load conditions. For example, memory load, which is the load I focused on, uses 110% of the system memory, allocating, touching and moving memory. There are other loads, like process load, IO load and even contest benchmarks the compilation under a system without any load condition. Given that the idea behind contest benchmark is very interesting, I think it would be nice to have compressed cache improving the system performance when running this benchmark under memory load condition. Even if not possible, at least understand what is going on. Therefore, I focused on this problem and I think I've come to interesting conclusions. Running 2.4.18 vanilla and 2.4.18-0.24pre5 (not released yet), the results I got were: 2.4.18: 95.03s completion time 76% CPU usage 2.4.18-cc: 99.84s completion time 72% CPU usage First of all, I thought that our problem was the high number of compressions and decompressions, but in this case we would have a higher CPU usage. Checking /proc/stat outputs, I could check that compressed cache reduces expressively the number of IO performed by system, what should make the kernel compilation performed by contest to complete faster. From some profiling data, I also noticed that compressed cache reduces the time the CPU is in idle state in a very significant manner: from 20.14 to 1.62 seconds. The reason why we have a worse completion time is that a kernel with compressed cache may and probably has a different scheduling in comparison to a vanilla kernel. That's because we reduce (and very much, depending on the case) the IO performed by the system. Notably to service a page fault, IO forces the current process to relinquish CPU and the scheduler tries to execute another task. Given that we reduce the total IO, we have much less of this compulsory scheduling due to a page fault, for example. We also spend some system time compressing and decompressing memory pages, what sums up to the current task system time (another reason to a slightly different scheduling), but that ends up to be less time than performed by IO operations. Running contest with the mem_load, a kernel with compressed cache doesn't perform any swap, very far from the over 60 thousand swapins and over 70 thousand swapouts performed by vanilla. Concerning mem_load and kernel compilation, the former has all its IO saved (its IO is only swapin/out operation), so it doesn't relinquish CPU to perform IO as on a vanilla kernel. On the other hand, kernel compilation still has to perform some operations that cannot be saved (like writing .o files or reading its source files) and even though we relinquish CPU less than on vanilla because we compress pages from page cache, it is more than mem_load does. To be brief, in vanilla case, mem_load is scheduled much more due to IO (think about the swapins/swapouts mentioned above), giving more control of the CPU to kernel compilation than on a kernel with compressed cache. In compressed cache case, mem_load uses most it's CPU time slice because it doesn't have to perform IO, so kernel compilation takes less control of CPU (that's why the CPU usage is smaller), taking a little longer to finish. In spite of having a worse compilation performance, the system, generally speaking, runs smoother. The mem_load "for" runs much more than on vanilla. If you run mem_load with the debug printf()s, it is quite expressive the difference. Under memory load condition, contest only measures the time spent to compile the kernel, but it doesn't take into account how the other background processes are affected by the compilation. With compressed cache, this particular background process (mem_load) and the overall system have better performance. Note that other background processes might have different results with compressed cache depending on what they do. I don't think the current contest, for memory load situation, is suitable to benchmark a system with compressed cache. It doesn't check the improvement to the whole system, only for a process, what may be influenced by scheduling issues. [1] http://contest.kolivas.net Regards, -- Rodrigo |