You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(32) |
Jun
(66) |
Jul
(102) |
Aug
(78) |
Sep
(106) |
Oct
(137) |
Nov
(147) |
Dec
(147) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(71) |
Feb
(139) |
Mar
(86) |
Apr
(76) |
May
(57) |
Jun
(10) |
Jul
(12) |
Aug
(6) |
Sep
(8) |
Oct
(12) |
Nov
(12) |
Dec
(18) |
| 2011 |
Jan
(16) |
Feb
(19) |
Mar
(3) |
Apr
(1) |
May
(16) |
Jun
(17) |
Jul
(74) |
Aug
(22) |
Sep
(18) |
Oct
(24) |
Nov
(21) |
Dec
(30) |
| 2012 |
Jan
(31) |
Feb
(16) |
Mar
(22) |
Apr
(25) |
May
(18) |
Jun
(13) |
Jul
(83) |
Aug
(49) |
Sep
(20) |
Oct
(60) |
Nov
(35) |
Dec
(28) |
| 2013 |
Jan
(39) |
Feb
(61) |
Mar
(35) |
Apr
(21) |
May
(45) |
Jun
(56) |
Jul
(20) |
Aug
(9) |
Sep
(10) |
Oct
(31) |
Nov
(8) |
Dec
(4) |
| 2014 |
Jan
(6) |
Feb
(7) |
Mar
(7) |
Apr
(6) |
May
(4) |
Jun
(8) |
Jul
(5) |
Aug
(2) |
Sep
(4) |
Oct
(4) |
Nov
(11) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(4) |
Mar
(3) |
Apr
(4) |
May
(9) |
Jun
(4) |
Jul
(15) |
Aug
(8) |
Sep
(16) |
Oct
(18) |
Nov
(15) |
Dec
(7) |
| 2016 |
Jan
(20) |
Feb
(9) |
Mar
(15) |
Apr
(24) |
May
(16) |
Jun
(28) |
Jul
(22) |
Aug
(23) |
Sep
(18) |
Oct
(30) |
Nov
(40) |
Dec
(9) |
| 2017 |
Jan
(1) |
Feb
(8) |
Mar
(37) |
Apr
(26) |
May
(25) |
Jun
(46) |
Jul
(24) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Satoru M. <sat...@hd...> - 2012-05-23 20:45:34
|
Hi Andrew, This patch has been reviewed for couple of months. This patch *only* improves the behavior when the kernel has enough filebacked pages. It means that it does not change the behavior when kernel has small number of filebacked pages. Kosaki-san pointed out that the threshold which we use to decide whether filebacked page is enough or not is not appropriate(*). (*) http://www.spinics.net/lists/linux-mm/msg32380.html As I described in (**), I believe that threshold discussion should be done in other thread because it affects not only swappiness=0 case and the kernel behave the same way with or without this patch below the threshold. (**) http://www.spinics.net/lists/linux-mm/msg34317.html The patch may not be perfect but, at least, we can improve the kernel behavior in the enough filebacked memory case with this patch. I believe it's better than nothing. Do you have any comments about it? NOTE: I updated the patch with Acked-by tags --- Sometimes we'd like to avoid swapping out anonymous memory in particular, avoid swapping out pages of important process or process groups while there is a reasonable amount of pagecache on RAM so that we can satisfy our customers' requirements. OTOH, we can control how aggressive the kernel will swap memory pages with /proc/sys/vm/swappiness for global and /sys/fs/cgroup/memory/memory.swappiness for each memcg. But with current reclaim implementation, the kernel may swap out even if we set swappiness==0 and there is pagecache on RAM. This patch changes the behavior with swappiness==0. If we set swappiness==0, the kernel does not swap out completely (for global reclaim until the amount of free pages and filebacked pages in a zone has been reduced to something very very small (nr_free + nr_filebacked < high watermark)). Any comments are welcome. Regards, Satoru Moriya Signed-off-by: Satoru Moriya <sat...@hd...> Acked-by: Minchan Kim <mi...@ke...> Acked-by: Rik van Riel <ri...@re...> --- mm/vmscan.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 33dc256..52d64bf 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc, * proportional to the fraction of recently scanned pages on * each list that were recently referenced and in active use. */ - ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1); + ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1); ap /= reclaim_stat->recent_rotated[0] + 1; - fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1); + fp = file_prio * (reclaim_stat->recent_scanned[1] + 1); fp /= reclaim_stat->recent_rotated[1] + 1; spin_unlock_irq(&mz->zone->lru_lock); @@ -1999,7 +1999,7 @@ out: unsigned long scan; scan = zone_nr_lru_pages(mz, lru); - if (priority || noswap) { + if (priority || noswap || !vmscan_swappiness(mz, sc)) { scan >>= priority; if (!scan && force_scan) scan = SWAP_CLUSTER_MAX; -- 1.7.6.5 |
|
From: Satoru M. <sat...@hd...> - 2012-05-21 13:39:47
|
Hi Richard, On 05/21/2012 03:12 AM, Richard Davies wrote: > Now that 3.4 is out with Rik's fixes, I'm keen to start testing with > and without this extra patch. > > Satoru - should I just apply your original patch (most likely), or do > you need to update for the final released kernel? Thank you for testing! I believe you can apply the patch without any updates. Regards, Satoru |
|
From: Richard D. <ric...@el...> - 2012-05-21 07:12:41
|
Hi Satoru, Rik van Riel wrote: > KOSAKI Motohiro wrote: > > Richard Davies wrote: > > >Satoru Moriya wrote: > > > > > I have run into problems with heavy swapping with swappiness==0 and > > > > > was pointed to this thread ( > > > > > http://marc.info/?l=linux-mm&m=133522782307215 ) > > > > > > > > Did you test this patch with your workload? > > > > > > I haven't yet tested this patch. It takes a long time since these are > > > production machines, and the bug itself takes several weeks of > > > production use to really show up. > > > > > > Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: > > > http://marc.info/?l=linux-mm&m=133536506926326 > > > > > > My intention is to reboot half of our machines into plain 3.4 once it > > > is out, and half onto 3.4 + your patch. > > > > > > Then we can compare behaviour. > > > > > > Will your patch apply cleanly on 3.4? > > > > Note. This patch doesn't solve your issue. This patch mean, > > when occuring very few swap io, it change to 0. But you said > > you are seeing eager swap io. As Dave already pointed out, your > > machine have buffer head issue. > > > > So, this thread is pointless. > > Running KVM guests directly off block devices results in a lot > of buffer cache. > > I suspect that this patch will in fact fix Richard's issue. > > The patch is small, fairly simple and looks like it will fix > people's problems. It also makes swappiness=0 behave the way > most people seem to imagine it would work. > > If it works for a few people (test results), I believe we > might as well merge it. > > Yes, for cgroups we may need additional logic, but we can > sort that out as we go along. Now that 3.4 is out with Rik's fixes, I'm keen to start testing with and without this extra patch. Satoru - should I just apply your original patch (most likely), or do you need to update for the final released kernel? Thanks, Richard. |
|
From: tip-bot f. S. A. <sei...@hd...> - 2012-05-18 12:11:48
|
Commit-ID: 62be73eafaa045d3233337303fb140f7f8a61135 Gitweb: http://git.kernel.org/tip/62be73eafaa045d3233337303fb140f7f8a61135 Author: Seiji Aguchi <sei...@hd...> AuthorDate: Tue, 15 May 2012 17:35:09 -0400 Committer: Ingo Molnar <mi...@ke...> CommitDate: Fri, 18 May 2012 14:02:10 +0200 kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() This patch moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop(), to serialize the crash-logging process via smp_send_stop() and to thus retrieve a more stable crash image of all CPUs stopped. Signed-off-by: Seiji Aguchi <sei...@hd...> Acked-by: Don Zickus <dz...@re...> Cc: dle...@li... <dle...@li...> Cc: Satoru Moriya <sat...@hd...> Cc: Tony Luck <ton...@in...> Cc: a.p...@ch... <a.p...@ch...> Link: http://lkml.kernel.org/r/5C4...@US... Signed-off-by: Ingo Molnar <mi...@ke...> --- kernel/panic.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index b6215b7..d2a5f4e 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -108,8 +108,6 @@ void panic(const char *fmt, ...) */ crash_kexec(NULL); - kmsg_dump(KMSG_DUMP_PANIC); - /* * Note smp_send_stop is the usual smp shutdown function, which * unfortunately means it may not be hardened to work in a panic @@ -117,6 +115,8 @@ void panic(const char *fmt, ...) */ smp_send_stop(); + kmsg_dump(KMSG_DUMP_PANIC); + atomic_notifier_call_chain(&panic_notifier_list, 0, buf); bust_spinlocks(0); |
|
From: Ingo M. <mi...@ke...> - 2012-05-18 12:03:20
|
* Peter Zijlstra <a.p...@ch...> wrote: > On Fri, 2012-05-18 at 03:49 -0700, tip-bot for Seiji Aguchi wrote: > > > kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() > > > > This patch moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop(), > > to serialize the crash-logging process via smp_send_stop() and to > > thus retrieve a more stable crash image of all CPUs stopped. > > I don't want to be a spoil sport or anything, but this patch doesn't > move anything, it just removes.. Hm, indeed. And the patch in the email is fine. I think I messed up a conflict resolution ... Should be fixed now. Thanks, Ingo |
|
From: Peter Z. <a.p...@ch...> - 2012-05-18 11:47:33
|
On Fri, 2012-05-18 at 03:49 -0700, tip-bot for Seiji Aguchi wrote: > kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() > > This patch moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop(), > to serialize the crash-logging process via smp_send_stop() and to > thus retrieve a more stable crash image of all CPUs stopped. I don't want to be a spoil sport or anything, but this patch doesn't move anything, it just removes.. > --- > kernel/panic.c | 2 -- > 1 files changed, 0 insertions(+), 2 deletions(-) > > diff --git a/kernel/panic.c b/kernel/panic.c > index b6215b7..d4f0b61 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -108,8 +108,6 @@ void panic(const char *fmt, ...) > */ > crash_kexec(NULL); > > - kmsg_dump(KMSG_DUMP_PANIC); > - > /* > * Note smp_send_stop is the usual smp shutdown function, which > * unfortunately means it may not be hardened to work in a panic |
|
From: tip-bot f. S. A. <sei...@hd...> - 2012-05-18 10:50:13
|
Commit-ID: f80f749132165fb083865e51afd65cf0f05dcd62 Gitweb: http://git.kernel.org/tip/f80f749132165fb083865e51afd65cf0f05dcd62 Author: Seiji Aguchi <sei...@hd...> AuthorDate: Tue, 15 May 2012 17:35:09 -0400 Committer: Ingo Molnar <mi...@ke...> CommitDate: Fri, 18 May 2012 10:07:37 +0200 kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() This patch moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop(), to serialize the crash-logging process via smp_send_stop() and to thus retrieve a more stable crash image of all CPUs stopped. Signed-off-by: Seiji Aguchi <sei...@hd...> Acked-by: Don Zickus <dz...@re...> Cc: dle...@li... <dle...@li...> Cc: Satoru Moriya <sat...@hd...> Cc: Tony Luck <ton...@in...> Cc: a.p...@ch... <a.p...@ch...> Link: http://lkml.kernel.org/r/5C4...@US... Signed-off-by: Ingo Molnar <mi...@ke...> --- kernel/panic.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index b6215b7..d4f0b61 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -108,8 +108,6 @@ void panic(const char *fmt, ...) */ crash_kexec(NULL); - kmsg_dump(KMSG_DUMP_PANIC); - /* * Note smp_send_stop is the usual smp shutdown function, which * unfortunately means it may not be hardened to work in a panic |
|
From: Seiji A. <sei...@hd...> - 2012-05-15 21:35:36
|
Hi, As Don mentioned in following thread, it would be nice for pstore/kmsg_dump to serialize panic path because they can log messages reliably. https://lkml.org/lkml/2011/10/13/427 This patch is based on Don's proposal switching smp_send_stop() from REBOOT_VECTOR to NMI which has already merged to -tip tree. https://lkml.org/lkml/2012/5/14/145 [Patch Description] This patch just moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop for serializing logging process via smp_send_stop. Signed-off-by: Seiji Aguchi <sei...@hd...> Acked-by: Don Zickus <dz...@re...> --- kernel/panic.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 80aed44..da585b8 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -108,8 +108,6 @@ void panic(const char *fmt, ...) */ crash_kexec(NULL); - kmsg_dump(KMSG_DUMP_PANIC); - /* * Note smp_send_stop is the usual smp shutdown function, which * unfortunately means it may not be hardened to work in a panic @@ -117,6 +115,8 @@ void panic(const char *fmt, ...) */ smp_send_stop(); + kmsg_dump(KMSG_DUMP_PANIC); + atomic_notifier_call_chain(&panic_notifier_list, 0, buf); bust_spinlocks(0); -- 1.7.1 |
|
From: Rik v. R. <ri...@re...> - 2012-05-12 22:21:27
|
On 05/11/2012 05:11 PM, Satoru Moriya wrote: > On 04/20/2012 08:21 PM, Satoru Moriya wrote: >> Ah yes, it is not so small now. >> On 4GB server, without THP min_free_kbytes is 8113 but with THP it is >> 67584. >> >> How about using low watermark or min watermark? >> Are they still big? >> >> ...or should we use other value? > > What do you think of the idea above? I believe that using the high watermark is just fine. We want to start swapping, before the page cache is so small that we start thrashing from that. > So, I propose that we start with applying this patch first > and then discuss/improve the threshold. > > The patch may not be perfect but, at least, we can improve > the kernel behavior in the enough filebacked memory case > with this patch. I believe it's better than nothing. Agreed. -- All rights reversed |
|
From: Satoru M. <sat...@hd...> - 2012-05-11 21:11:52
|
On 04/20/2012 08:21 PM, Satoru Moriya wrote:
> On 04/04/2012 01:38 PM, KOSAKI Motohiro wrote:
>> (4/3/12 4:25 AM), Jerome Marchand wrote:
>>> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>>>> 2012/3/30 Satoru Moriya<sat...@hd...>:
>>>>> So the kernel reclaims pages like following.
>>>>>
>>>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>>>> nr_free + nr_filebacked< watermark_high: reclaim only anonymous pages
>>>>
>>>> How?
>>>
>>> get_scan_count() checks that case explicitly:
>>>
>>> if (global_reclaim(sc)) {
>>> free = zone_page_state(mz->zone, NR_FREE_PAGES);
>>> /* If we have very few page cache pages,
>>> force-scan anon pages. */
>>> if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
>>> fraction[0] = 1;
>>> fraction[1] = 0;
>>> denominator = 1;
>>> goto out;
>>> }
>>> }
>>
>> Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
>> so, high watermask is not good threshold anymore.
>
> Ah yes, it is not so small now.
> On 4GB server, without THP min_free_kbytes is 8113 but with THP it is
> 67584.
>
> How about using low watermark or min watermark?
> Are they still big?
>
> ...or should we use other value?
What do you think of the idea above?
By the way, I'd like to discuss this topic in other thread
because discussion about optimal threshold where the kernel
changes its reclaim policy does not affect only swappiness==0
case but also all other settings.
So, I propose that we start with applying this patch first
and then discuss/improve the threshold.
The patch may not be perfect but, at least, we can improve
the kernel behavior in the enough filebacked memory case
with this patch. I believe it's better than nothing.
Regards,
Satoru
|
|
From: Minchan K. <mi...@ke...> - 2012-05-08 00:20:25
|
On 05/08/2012 05:09 AM, Rik van Riel wrote: > On 04/26/2012 11:41 AM, KOSAKI Motohiro wrote: >> On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies >> <ric...@el...> wrote: >>> Satoru Moriya wrote: >>>>> I have run into problems with heavy swapping with swappiness==0 and >>>>> was pointed to this thread ( >>>>> http://marc.info/?l=linux-mm&m=133522782307215 ) >>>> >>>> Did you test this patch with your workload? >>> >>> I haven't yet tested this patch. It takes a long time since these are >>> production machines, and the bug itself takes several weeks of >>> production >>> use to really show up. >>> >>> Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: >>> http://marc.info/?l=linux-mm&m=133536506926326 >>> >>> My intention is to reboot half of our machines into plain 3.4 once it is >>> out, and half onto 3.4 + your patch. >>> >>> Then we can compare behaviour. >>> >>> Will your patch apply cleanly on 3.4? >> >> Note. This patch doesn't solve your issue. This patch mean, >> when occuring very few swap io, it change to 0. But you said >> you are seeing eager swap io. As Dave already pointed out, your >> machine have buffer head issue. >> >> So, this thread is pointless. > > Running KVM guests directly off block devices results in a lot > of buffer cache. > > I suspect that this patch will in fact fix Richard's issue. > > The patch is small, fairly simple and looks like it will fix > people's problems. It also makes swappiness=0 behave the way > most people seem to imagine it would work. > > If it works for a few people (test results), I believe we > might as well merge it. > > Yes, for cgroups we may need additional logic, but we can > sort that out as we go along. > I agree Rik's opinion absolutely. -- Kind regards, Minchan Kim |
|
From: Rik v. R. <ri...@re...> - 2012-05-07 20:11:42
|
On 03/02/2012 12:36 PM, Satoru Moriya wrote: > Sometimes we'd like to avoid swapping out anonymous memory > in particular, avoid swapping out pages of important process or > process groups while there is a reasonable amount of pagecache > on RAM so that we can satisfy our customers' requirements. > > OTOH, we can control how aggressive the kernel will swap memory pages > with /proc/sys/vm/swappiness for global and > /sys/fs/cgroup/memory/memory.swappiness for each memcg. > > But with current reclaim implementation, the kernel may swap out > even if we set swappiness==0 and there is pagecache on RAM. > > This patch changes the behavior with swappiness==0. If we set > swappiness==0, the kernel does not swap out completely > (for global reclaim until the amount of free pages and filebacked > pages in a zone has been reduced to something very very small > (nr_free + nr_filebacked< high watermark)). > Signed-off-by: Satoru Moriya<sat...@hd...> Acked-by: Rik van Riel <ri...@re...> -- All rights reversed |
|
From: Rik v. R. <ri...@re...> - 2012-05-07 20:10:13
|
On 04/26/2012 11:41 AM, KOSAKI Motohiro wrote: > On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies > <ric...@el...> wrote: >> Satoru Moriya wrote: >>>> I have run into problems with heavy swapping with swappiness==0 and >>>> was pointed to this thread ( >>>> http://marc.info/?l=linux-mm&m=133522782307215 ) >>> >>> Did you test this patch with your workload? >> >> I haven't yet tested this patch. It takes a long time since these are >> production machines, and the bug itself takes several weeks of production >> use to really show up. >> >> Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: >> http://marc.info/?l=linux-mm&m=133536506926326 >> >> My intention is to reboot half of our machines into plain 3.4 once it is >> out, and half onto 3.4 + your patch. >> >> Then we can compare behaviour. >> >> Will your patch apply cleanly on 3.4? > > Note. This patch doesn't solve your issue. This patch mean, > when occuring very few swap io, it change to 0. But you said > you are seeing eager swap io. As Dave already pointed out, your > machine have buffer head issue. > > So, this thread is pointless. Running KVM guests directly off block devices results in a lot of buffer cache. I suspect that this patch will in fact fix Richard's issue. The patch is small, fairly simple and looks like it will fix people's problems. It also makes swappiness=0 behave the way most people seem to imagine it would work. If it works for a few people (test results), I believe we might as well merge it. Yes, for cgroups we may need additional logic, but we can sort that out as we go along. -- All rights reversed |
|
From: Rik v. R. <ri...@re...> - 2012-04-27 13:55:54
|
On 04/26/2012 10:50 AM, Christoph Lameter wrote: > On Tue, 24 Apr 2012, Richard Davies wrote: > >> I strongly believe that Linux should have a way to turn off swapping unless >> absolutely necessary. This means that users like us can run with swap >> present for emergency use, rather than having to disable it because of the >> side effects. > > Agree. And this ooperation mode should be the default behavior given that > swapping is a very slow and tedious process these days. I believe that is a bad idea. With cgroups, the situation is a whole lot less obvious than with the simple test done in this patch. Lets see how the 3.4 code behaves, and if we need any additional changes to reduce swapping and step up reclaiming of page cache... -- All rights reversed |
|
From: Christoph L. <cl...@li...> - 2012-04-26 18:20:22
|
On Thu, 26 Apr 2012, KOSAKI Motohiro wrote: > (4/26/12 10:50 AM), Christoph Lameter wrote: > > On Tue, 24 Apr 2012, Richard Davies wrote: > > > > > I strongly believe that Linux should have a way to turn off swapping > > > unless > > > absolutely necessary. This means that users like us can run with swap > > > present for emergency use, rather than having to disable it because of the > > > side effects. > > > > Agree. And this ooperation mode should be the default behavior given that > > swapping is a very slow and tedious process these days. > > Even though current patch is not optimal, I don't disagree this opinion. Can > you please explain your use case? Why don't you use swapoff? Because I do not want to have systems go OOM. In an emergency lets use swap (and maybe generate some sort of alert if that happens). > Off topic: I hope linux is going to aim good swap clustered io in future. > Especially > when using THP, 4k size io is not really good. Swap to regular disks is going to be an ever greater problem since the access speed of rotational media has not changed much whereas the processing performance of the cpu has increased significantly. There is an ever increasing gap in speed. |
|
From: Richard D. <ric...@el...> - 2012-04-26 16:09:18
|
KOSAKI Motohiro wrote: > Christoph Lameter wrote: > > Richard Davies wrote: > > > > > I strongly believe that Linux should have a way to turn off swapping unless > > > absolutely necessary. This means that users like us can run with swap > > > present for emergency use, rather than having to disable it because of the > > > side effects. > > > > Agree. And this ooperation mode should be the default behavior given that > > swapping is a very slow and tedious process these days. > > Even though current patch is not optimal, I don't disagree this opinion. Can > you please explain your use case? Why don't you use swapoff? My use case is that I have large (64 or 128GB RAM) qemu-kvm virtualization hosts, running many (20-50) VMs. Typically the total memory in use is less than physical memory. In these cases I would like the virtualization host to run without any swapping. I have set swappiness==0, but in practise I get big load spikes from swapping. See http://marc.info/?l=linux-mm&m=133517452117581 I don't want to run swapoff, because sometimes I will need to provision slightly more VMs than physical memory, and in these cases I would rather that the system runs with a little swap in use rather than the OOM killer occurring. Richard. |
|
From: KOSAKI M. <kos...@gm...> - 2012-04-26 15:42:23
|
On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies <ric...@el...> wrote: > Satoru Moriya wrote: >> > I have run into problems with heavy swapping with swappiness==0 and >> > was pointed to this thread ( >> > http://marc.info/?l=linux-mm&m=133522782307215 ) >> >> Did you test this patch with your workload? > > I haven't yet tested this patch. It takes a long time since these are > production machines, and the bug itself takes several weeks of production > use to really show up. > > Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: > http://marc.info/?l=linux-mm&m=133536506926326 > > My intention is to reboot half of our machines into plain 3.4 once it is > out, and half onto 3.4 + your patch. > > Then we can compare behaviour. > > Will your patch apply cleanly on 3.4? Note. This patch doesn't solve your issue. This patch mean, when occuring very few swap io, it change to 0. But you said you are seeing eager swap io. As Dave already pointed out, your machine have buffer head issue. So, this thread is pointless. |
|
From: KOSAKI M. <kos...@gm...> - 2012-04-26 15:37:09
|
(4/26/12 10:50 AM), Christoph Lameter wrote: > On Tue, 24 Apr 2012, Richard Davies wrote: > >> I strongly believe that Linux should have a way to turn off swapping unless >> absolutely necessary. This means that users like us can run with swap >> present for emergency use, rather than having to disable it because of the >> side effects. > > Agree. And this ooperation mode should be the default behavior given that > swapping is a very slow and tedious process these days. Even though current patch is not optimal, I don't disagree this opinion. Can you please explain your use case? Why don't you use swapoff? Off topic: I hope linux is going to aim good swap clustered io in future. Especially when using THP, 4k size io is not really good. |
|
From: Christoph L. <cl...@li...> - 2012-04-26 15:17:15
|
On Tue, 24 Apr 2012, Richard Davies wrote: > I strongly believe that Linux should have a way to turn off swapping unless > absolutely necessary. This means that users like us can run with swap > present for emergency use, rather than having to disable it because of the > side effects. Agree. And this ooperation mode should be the default behavior given that swapping is a very slow and tedious process these days. |
|
From: Richard D. <ric...@el...> - 2012-04-26 14:27:09
|
Satoru Moriya wrote: > > I have run into problems with heavy swapping with swappiness==0 and > > was pointed to this thread ( > > http://marc.info/?l=linux-mm&m=133522782307215 ) > > Did you test this patch with your workload? I haven't yet tested this patch. It takes a long time since these are production machines, and the bug itself takes several weeks of production use to really show up. Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: http://marc.info/?l=linux-mm&m=133536506926326 My intention is to reboot half of our machines into plain 3.4 once it is out, and half onto 3.4 + your patch. Then we can compare behaviour. Will your patch apply cleanly on 3.4? Richard. |
|
From: Satoru M. <sat...@hd...> - 2012-04-24 22:15:09
|
On 04/24/2012 04:20 AM, Richard Davies wrote: > > I have run into problems with heavy swapping with swappiness==0 and > was pointed to this thread ( > http://marc.info/?l=linux-mm&m=133522782307215 ) Did you test this patch with your workload? If yes, how did it come out? > I strongly believe that Linux should have a way to turn off swapping > unless absolutely necessary. This means that users like us can run > with swap present for emergency use, rather than having to disable it > because of the side effects. Agreed. That is why I proposed the patch. > Personally, I feel that swappiness==0 should have this (intuitive) > meaning, and that people running RHEL5 are extremely unlikely to run > 3.5 kernels(!) > > However, swappiness==-1 or some other hack is definitely better than > no patch. Regards, Satoru |
|
From: Richard D. <ri...@ar...> - 2012-04-24 08:46:03
|
On 03/07/2012 18:18 PM, Satoru Moriya wrote: > On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote: >> On 3/5/2012 4:56 PM, Johannes Weiner wrote: >>> On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote: >>>> >>>> This patch changes the behavior with swappiness==0. If we set >>>> swappiness==0, the kernel does not swap out completely (for global >>>> reclaim until the amount of free pages and filebacked pages in a >>>> zone has been reduced to something very very small (nr_free + >>>> nr_filebacked < high watermark)). >>>> >>>> Any comments are welcome. >>> >>> Last time I tried that (getting rid of sc->may_swap, using >>> !swappiness), it was rejected it as there were users who relied on >>> swapping very slowly with this setting. >>> >>> KOSAKI-san, do I remember correctly? Do you still think it's an >>> issue? >>> >>> Personally, I still think it's illogical that !swappiness allows >>> swapping and would love to see this patch go in. >> >> Thank you. I brought back to memory it. Unfortunately DB folks are >> still mainly using RHEL5 generation distros. At that time, swapiness=0 >> doesn't mean disabling swap. >> >> They want, "don't swap as far as kernel has any file cache page". but >> linux don't have such feature. then they used swappiness for emulate >> it. So, I think this patch clearly make userland harm. Because of, we >> don't have an alternative way. > > If they expect the behavior that "don't swap as far as kernel > has any file cache page", this patch definitely helps them > because if we set swappiness==0, kernel does not swap out > *until* nr_free + nr_filebacked < high watermark in the zone. > It means kernel begins to swap out when nr_free + nr_filebacked > becomes less than high watermark. > > But, yes, this patch actually changes the behavior with > swappiness==0 and so it may make userland harm. > > How about introducing new value e.g -1 to avoid swap and > maintain compatibility? I have run into problems with heavy swapping with swappiness==0 and was pointed to this thread ( http://marc.info/?l=linux-mm&m=133522782307215 ) I strongly believe that Linux should have a way to turn off swapping unless absolutely necessary. This means that users like us can run with swap present for emergency use, rather than having to disable it because of the side effects. Personally, I feel that swappiness==0 should have this (intuitive) meaning, and that people running RHEL5 are extremely unlikely to run 3.5 kernels(!) However, swappiness==-1 or some other hack is definitely better than no patch. Richard. |
|
From: Satoru M. <sat...@hd...> - 2012-04-21 00:21:51
|
Hi,
Sorry for my late reply.
On 04/04/2012 01:38 PM, KOSAKI Motohiro wrote:
> (4/3/12 4:25 AM), Jerome Marchand wrote:
>> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>>> 2012/3/30 Satoru Moriya<sat...@hd...>:
>>>> So the kernel reclaims pages like following.
>>>>
>>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>>> nr_free + nr_filebacked< watermark_high: reclaim only anonymous pages
>>>
>>> How?
>>
>> get_scan_count() checks that case explicitly:
>>
>> if (global_reclaim(sc)) {
>> free = zone_page_state(mz->zone, NR_FREE_PAGES);
>> /* If we have very few page cache pages,
>> force-scan anon pages. */
>> if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
>> fraction[0] = 1;
>> fraction[1] = 0;
>> denominator = 1;
>> goto out;
>> }
>> }
>
> Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
> so, high watermask is not good threshold anymore.
Ah yes, it is not so small now.
On 4GB server, without THP min_free_kbytes is 8113 but
with THP it is 67584.
How about using low watermark or min watermark?
Are they still big?
...or should we use other value?
Regards,
Satoru
|
|
From: <teb...@wa...> - 2012-04-14 16:09:39
|
<HTML><HEAD> <META name=GENERATOR content="MSHTML 8.00.6001.19190"></HEAD> <BODY> <TABLE border=0 cellPadding=6 width=600 bgColor=#000080 align=center height=600> <TBODY> <TR> <TD vAlign=top> <TABLE cellPadding=8 width=580 bgColor=#000000 align=center height=50> <TBODY> <TR> <TD><FONT color=#ffffff size=4 face=arial><STRONG> <CENTER>Haga su Web profesional con Joomla SIN PROGRAMACION</CENTER></STRONG></FONT></TD></TR> <TR> <TD><img border="0" hspace="0" alt src="cid:jc.jpg"></TD></TR> <TR> <TD><FONT color=#dad8d8 size=2 face=arial><STRONG>Si usted es programador, diseñador, webmaster o simplemente aficionado a la computación, eso no tiene ninguna importancia, sabiendo usar Joomla siempre logrará páginas Web de apariencia profesional, solo con unas pocas horas de trabajo y sin escribir ni una línea de código.</STRONG></FONT></TD></TR> <TR> <TD><FONT color=#ccfefe size=2 face=arial><STRONG>Le ofrecemos los cursos de Joomla más completos de la Argentina, visite nuestra página Web y podrá ver el programa, duración, precios, horarios y mucho más sobre Joomla y el desarrollo de páginas Web.<BR><BR><STRONG></FONT></STRONG></STRONG></TD></TR> <TR> <TD> <CENTER><img border="0" hspace="0" alt src="cid:joomcla.jpg"></CENTER></TD></TR> <TR> <TD vAlign=bottom> <CENTER><FONT color=#dad8d8 size=1 face=arial>Para no recibir más esta publilcidad envíe un email a <img border="0" hspace="0" alt src="cid:mail.jpg">con Borrar como asunto.</FONT></CENTER></TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE></BODY></HTML> |
|
From: Seiji A. <sei...@hd...> - 2012-04-13 15:44:39
|
Matthew, Do you have any comment on this patch? Seiji > -----Original Message----- > From: Seiji Aguchi > Sent: Wednesday, March 07, 2012 5:50 PM > To: lin...@vg...; Luck, Tony (ton...@in...); Chen Gong (gon...@li...); Matthew Garrett > (mj...@re...); dz...@re... > Cc: dle...@li...; Satoru Moriya > Subject: [PATCH] efi: Avoid sysfs spew on reboot and panic > > Hi, > > This patch just modified Matthew's patch which has not included in upstream to fit current upstream code. > > https://lkml.org/lkml/2011/9/20/468 > > Right now all pstore accesses to efivars will delete or create new sysfs nodes. This is less than ideal if we've panicked or rebooting for > following reasons. > - efi_pstore may not work if kernel panics in interrupt context, since > cpu can sleep while creating sysfs files. > - we don't need to create sysfs if we've panicked or rebooting, because > no one can access to it. > > > Signed-off-by: Seiji Aguchi <sei...@hd...> > Signed-off-by: Matthew Garrett <mj...@re...> > --- > drivers/firmware/efivars.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c index d25599f..34c8890 100644 > --- a/drivers/firmware/efivars.c > +++ b/drivers/firmware/efivars.c > @@ -550,6 +550,16 @@ static int efi_pstore_write(enum pstore_type_id type, > > spin_unlock(&efivars->lock); > > + /* > + * If it's more severe than KMSG_DUMP_OOPS then we're already dead. > + * Don't bother playing with sysfs. > + */ > + > + if (reason != KMSG_DUMP_OOPS) { > + *id = part; > + return ret; > + } > + > if (found) > efivar_unregister(found); > > -- 1.7.1 |