|
From: Satoru M. <sat...@hd...> - 2012-03-02 17:37:03
|
Sometimes we'd like to avoid swapping out anonymous memory
in particular, avoid swapping out pages of important process or
process groups while there is a reasonable amount of pagecache
on RAM so that we can satisfy our customers' requirements.
OTOH, we can control how aggressive the kernel will swap memory pages
with /proc/sys/vm/swappiness for global and
/sys/fs/cgroup/memory/memory.swappiness for each memcg.
But with current reclaim implementation, the kernel may swap out
even if we set swappiness==0 and there is pagecache on RAM.
This patch changes the behavior with swappiness==0. If we set
swappiness==0, the kernel does not swap out completely
(for global reclaim until the amount of free pages and filebacked
pages in a zone has been reduced to something very very small
(nr_free + nr_filebacked < high watermark)).
Any comments are welcome.
Regards,
Satoru Moriya
Signed-off-by: Satoru Moriya <sat...@hd...>
---
mm/vmscan.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c52b235..27dc3e8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
* proportional to the fraction of recently scanned pages on
* each list that were recently referenced and in active use.
*/
- ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
+ ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
ap /= reclaim_stat->recent_rotated[0] + 1;
- fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
+ fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
fp /= reclaim_stat->recent_rotated[1] + 1;
spin_unlock_irq(&mz->zone->lru_lock);
@@ -1999,7 +1999,7 @@ out:
unsigned long scan;
scan = zone_nr_lru_pages(mz, lru);
- if (priority || noswap) {
+ if (priority || noswap || !vmscan_swappiness(mz, sc)) {
scan >>= priority;
if (!scan && force_scan)
scan = SWAP_CLUSTER_MAX;
--
1.7.6.4
|
|
From: Rik v. R. <ri...@re...> - 2012-03-02 22:47:54
|
On 03/02/2012 12:36 PM, Satoru Moriya wrote:
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
>
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
>
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked< high watermark)).
>
> Any comments are welcome.
>
> Regards,
> Satoru Moriya
>
> Signed-off-by: Satoru Moriya<sat...@hd...>
> ---
> mm/vmscan.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c52b235..27dc3e8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
> * proportional to the fraction of recently scanned pages on
> * each list that were recently referenced and in active use.
> */
> - ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
> + ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
> ap /= reclaim_stat->recent_rotated[0] + 1;
>
> - fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
> + fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
> fp /= reclaim_stat->recent_rotated[1] + 1;
> spin_unlock_irq(&mz->zone->lru_lock);
ACK on this bit of the patch.
> @@ -1999,7 +1999,7 @@ out:
> unsigned long scan;
>
> scan = zone_nr_lru_pages(mz, lru);
> - if (priority || noswap) {
> + if (priority || noswap || !vmscan_swappiness(mz, sc)) {
> scan>>= priority;
> if (!scan&& force_scan)
> scan = SWAP_CLUSTER_MAX;
However, I do not understand why we fail to scale
the number of pages we want to scan with priority
if "noswap".
For that matter, surely if we do not want to swap
out anonymous pages, we WANT to go into this if
branch, in order to make sure we set "scan" to 0?
scan = div64_u64(scan * fraction[file], denominator);
With your patch and swappiness=0, or no swap space, it
looks like we do not zero out "scan" and may end up
scanning anonymous pages.
Am I overlooking something? Is this correct?
I mean, it is Friday and my brain is very full...
--
All rights reversed
|
|
From: Satoru M. <sat...@hd...> - 2012-03-02 23:43:23
|
Hi Rik,
Thank you for reviewing.
On 03/02/2012 05:47 PM, Rik van Riel wrote:
> On 03/02/2012 12:36 PM, Satoru Moriya wrote:
>> @@ -1999,7 +1999,7 @@ out:
>> unsigned long scan;
>>
>> scan = zone_nr_lru_pages(mz, lru);
>> - if (priority || noswap) {
>> + if (priority || noswap || !vmscan_swappiness(mz, sc)) {
>> scan>>= priority;
>> if (!scan&& force_scan)
>> scan = SWAP_CLUSTER_MAX;
>
> However, I do not understand why we fail to scale the number of pages
> we want to scan with priority if "noswap".
>
> For that matter, surely if we do not want to swap out anonymous pages,
> we WANT to go into this if branch, in order to make sure we set "scan"
> to 0?
>
> scan = div64_u64(scan * fraction[file], denominator);
>
> With your patch and swappiness=0, or no swap space, it looks like we
> do not zero out "scan" and may end up scanning anonymous pages.
With my patch, if swappiness==0 or noswap==1, fraction[file] is
set to 0. As a result, scan will be set to 0, too.
> Am I overlooking something? Is this correct?
>
> I mean, it is Friday and my brain is very full...
Have a nice weekend ;)
Regards,
Satoru
|
|
From: Minchan K. <mi...@ke...> - 2012-03-04 06:58:15
|
Hi Satoru, On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote: > Sometimes we'd like to avoid swapping out anonymous memory > in particular, avoid swapping out pages of important process or > process groups while there is a reasonable amount of pagecache > on RAM so that we can satisfy our customers' requirements. > > OTOH, we can control how aggressive the kernel will swap memory pages > with /proc/sys/vm/swappiness for global and > /sys/fs/cgroup/memory/memory.swappiness for each memcg. > > But with current reclaim implementation, the kernel may swap out > even if we set swappiness==0 and there is pagecache on RAM. > > This patch changes the behavior with swappiness==0. If we set > swappiness==0, the kernel does not swap out completely > (for global reclaim until the amount of free pages and filebacked > pages in a zone has been reduced to something very very small > (nr_free + nr_filebacked < high watermark)). > > Any comments are welcome. > > Regards, > Satoru Moriya > > Signed-off-by: Satoru Moriya <sat...@hd...> Acked-by: Minchan Kim <mi...@ke...> I agree this feature but current code is rather ugly on readbility. It's not your fault because it is caused by adding 'noswap' to avoid scanning of anon pages when priority is 0. You just used that code. :) Hillf's version looks to be much clean refactoring so after we merge your patch, we can tidy it up with Hillf's patch. |
|
From: Satoru M. <sat...@hd...> - 2012-03-05 21:48:42
|
Hi Minchan, Thank you for reviewing. On 03/04/2012 01:57 AM, Minchan Kim wrote: > On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote: > > I agree this feature but current code is rather ugly on readbility. I agree with you. > Hillf's version looks to be much clean refactoring so after we merge > your patch, we can tidy it up with Hillf's patch. Thanks. No problem. Regards, Satoru |
|
From: Rik v. R. <ri...@re...> - 2012-03-05 13:50:03
|
On 03/02/2012 12:36 PM, Satoru Moriya wrote: > Sometimes we'd like to avoid swapping out anonymous memory > in particular, avoid swapping out pages of important process or > process groups while there is a reasonable amount of pagecache > on RAM so that we can satisfy our customers' requirements. > > OTOH, we can control how aggressive the kernel will swap memory pages > with /proc/sys/vm/swappiness for global and > /sys/fs/cgroup/memory/memory.swappiness for each memcg. > > But with current reclaim implementation, the kernel may swap out > even if we set swappiness==0 and there is pagecache on RAM. > > This patch changes the behavior with swappiness==0. If we set > swappiness==0, the kernel does not swap out completely > (for global reclaim until the amount of free pages and filebacked > pages in a zone has been reduced to something very very small > (nr_free + nr_filebacked< high watermark)). > > Any comments are welcome. My mind is now rested by doing a nice 10 mile hike :) > Signed-off-by: Satoru Moriya<sat...@hd...> Reviewed-by: Rik van Riel <ri...@re...> -- All rights reversed |
|
From: Johannes W. <jw...@re...> - 2012-03-05 21:56:16
|
On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote: > Sometimes we'd like to avoid swapping out anonymous memory > in particular, avoid swapping out pages of important process or > process groups while there is a reasonable amount of pagecache > on RAM so that we can satisfy our customers' requirements. > > OTOH, we can control how aggressive the kernel will swap memory pages > with /proc/sys/vm/swappiness for global and > /sys/fs/cgroup/memory/memory.swappiness for each memcg. > > But with current reclaim implementation, the kernel may swap out > even if we set swappiness==0 and there is pagecache on RAM. > > This patch changes the behavior with swappiness==0. If we set > swappiness==0, the kernel does not swap out completely > (for global reclaim until the amount of free pages and filebacked > pages in a zone has been reduced to something very very small > (nr_free + nr_filebacked < high watermark)). > > Any comments are welcome. Last time I tried that (getting rid of sc->may_swap, using !swappiness), it was rejected it as there were users who relied on swapping very slowly with this setting. KOSAKI-san, do I remember correctly? Do you still think it's an issue? Personally, I still think it's illogical that !swappiness allows swapping and would love to see this patch go in. |
|
From: KOSAKI M. <kos...@jp...> - 2012-03-07 17:19:23
|
On 3/5/2012 4:56 PM, Johannes Weiner wrote: > On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote: >> Sometimes we'd like to avoid swapping out anonymous memory >> in particular, avoid swapping out pages of important process or >> process groups while there is a reasonable amount of pagecache >> on RAM so that we can satisfy our customers' requirements. >> >> OTOH, we can control how aggressive the kernel will swap memory pages >> with /proc/sys/vm/swappiness for global and >> /sys/fs/cgroup/memory/memory.swappiness for each memcg. >> >> But with current reclaim implementation, the kernel may swap out >> even if we set swappiness==0 and there is pagecache on RAM. >> >> This patch changes the behavior with swappiness==0. If we set >> swappiness==0, the kernel does not swap out completely >> (for global reclaim until the amount of free pages and filebacked >> pages in a zone has been reduced to something very very small >> (nr_free + nr_filebacked < high watermark)). >> >> Any comments are welcome. > > Last time I tried that (getting rid of sc->may_swap, using > !swappiness), it was rejected it as there were users who relied on > swapping very slowly with this setting. > > KOSAKI-san, do I remember correctly? Do you still think it's an > issue? > > Personally, I still think it's illogical that !swappiness allows > swapping and would love to see this patch go in. Thank you. I brought back to memory it. Unfortunately DB folks are still mainly using RHEL5 generation distros. At that time, swapiness=0 doesn't mean disabling swap. They want, "don't swap as far as kernel has any file cache page". but linux don't have such feature. then they used swappiness for emulate it. So, I think this patch clearly make userland harm. Because of, we don't have an alternative way. |
|
From: Satoru M. <sat...@hd...> - 2012-03-07 21:51:18
|
On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote: > On 3/5/2012 4:56 PM, Johannes Weiner wrote: >> On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote: >>> >>> This patch changes the behavior with swappiness==0. If we set >>> swappiness==0, the kernel does not swap out completely (for global >>> reclaim until the amount of free pages and filebacked pages in a >>> zone has been reduced to something very very small (nr_free + >>> nr_filebacked < high watermark)). >>> >>> Any comments are welcome. >> >> Last time I tried that (getting rid of sc->may_swap, using >> !swappiness), it was rejected it as there were users who relied on >> swapping very slowly with this setting. >> >> KOSAKI-san, do I remember correctly? Do you still think it's an >> issue? >> >> Personally, I still think it's illogical that !swappiness allows >> swapping and would love to see this patch go in. > > Thank you. I brought back to memory it. Unfortunately DB folks are > still mainly using RHEL5 generation distros. At that time, swapiness=0 > doesn't mean disabling swap. > > They want, "don't swap as far as kernel has any file cache page". but > linux don't have such feature. then they used swappiness for emulate > it. So, I think this patch clearly make userland harm. Because of, we > don't have an alternative way. If they expect the behavior that "don't swap as far as kernel has any file cache page", this patch definitely helps them because if we set swappiness==0, kernel does not swap out *until* nr_free + nr_filebacked < high watermark in the zone. It means kernel begins to swap out when nr_free + nr_filebacked becomes less than high watermark. But, yes, this patch actually changes the behavior with swappiness==0 and so it may make userland harm. How about introducing new value e.g -1 to avoid swap and maintain compatibility? Regards, Satoru |
|
From: Richard D. <ri...@ar...> - 2012-04-24 08:46:03
|
On 03/07/2012 18:18 PM, Satoru Moriya wrote: > On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote: >> On 3/5/2012 4:56 PM, Johannes Weiner wrote: >>> On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote: >>>> >>>> This patch changes the behavior with swappiness==0. If we set >>>> swappiness==0, the kernel does not swap out completely (for global >>>> reclaim until the amount of free pages and filebacked pages in a >>>> zone has been reduced to something very very small (nr_free + >>>> nr_filebacked < high watermark)). >>>> >>>> Any comments are welcome. >>> >>> Last time I tried that (getting rid of sc->may_swap, using >>> !swappiness), it was rejected it as there were users who relied on >>> swapping very slowly with this setting. >>> >>> KOSAKI-san, do I remember correctly? Do you still think it's an >>> issue? >>> >>> Personally, I still think it's illogical that !swappiness allows >>> swapping and would love to see this patch go in. >> >> Thank you. I brought back to memory it. Unfortunately DB folks are >> still mainly using RHEL5 generation distros. At that time, swapiness=0 >> doesn't mean disabling swap. >> >> They want, "don't swap as far as kernel has any file cache page". but >> linux don't have such feature. then they used swappiness for emulate >> it. So, I think this patch clearly make userland harm. Because of, we >> don't have an alternative way. > > If they expect the behavior that "don't swap as far as kernel > has any file cache page", this patch definitely helps them > because if we set swappiness==0, kernel does not swap out > *until* nr_free + nr_filebacked < high watermark in the zone. > It means kernel begins to swap out when nr_free + nr_filebacked > becomes less than high watermark. > > But, yes, this patch actually changes the behavior with > swappiness==0 and so it may make userland harm. > > How about introducing new value e.g -1 to avoid swap and > maintain compatibility? I have run into problems with heavy swapping with swappiness==0 and was pointed to this thread ( http://marc.info/?l=linux-mm&m=133522782307215 ) I strongly believe that Linux should have a way to turn off swapping unless absolutely necessary. This means that users like us can run with swap present for emergency use, rather than having to disable it because of the side effects. Personally, I feel that swappiness==0 should have this (intuitive) meaning, and that people running RHEL5 are extremely unlikely to run 3.5 kernels(!) However, swappiness==-1 or some other hack is definitely better than no patch. Richard. |
|
From: Christoph L. <cl...@li...> - 2012-04-26 15:17:15
|
On Tue, 24 Apr 2012, Richard Davies wrote: > I strongly believe that Linux should have a way to turn off swapping unless > absolutely necessary. This means that users like us can run with swap > present for emergency use, rather than having to disable it because of the > side effects. Agree. And this ooperation mode should be the default behavior given that swapping is a very slow and tedious process these days. |
|
From: Rik v. R. <ri...@re...> - 2012-04-27 13:55:54
|
On 04/26/2012 10:50 AM, Christoph Lameter wrote: > On Tue, 24 Apr 2012, Richard Davies wrote: > >> I strongly believe that Linux should have a way to turn off swapping unless >> absolutely necessary. This means that users like us can run with swap >> present for emergency use, rather than having to disable it because of the >> side effects. > > Agree. And this ooperation mode should be the default behavior given that > swapping is a very slow and tedious process these days. I believe that is a bad idea. With cgroups, the situation is a whole lot less obvious than with the simple test done in this patch. Lets see how the 3.4 code behaves, and if we need any additional changes to reduce swapping and step up reclaiming of page cache... -- All rights reversed |
|
From: KOSAKI M. <kos...@gm...> - 2012-04-26 15:37:09
|
(4/26/12 10:50 AM), Christoph Lameter wrote: > On Tue, 24 Apr 2012, Richard Davies wrote: > >> I strongly believe that Linux should have a way to turn off swapping unless >> absolutely necessary. This means that users like us can run with swap >> present for emergency use, rather than having to disable it because of the >> side effects. > > Agree. And this ooperation mode should be the default behavior given that > swapping is a very slow and tedious process these days. Even though current patch is not optimal, I don't disagree this opinion. Can you please explain your use case? Why don't you use swapoff? Off topic: I hope linux is going to aim good swap clustered io in future. Especially when using THP, 4k size io is not really good. |
|
From: Richard D. <ric...@el...> - 2012-04-26 16:09:18
|
KOSAKI Motohiro wrote: > Christoph Lameter wrote: > > Richard Davies wrote: > > > > > I strongly believe that Linux should have a way to turn off swapping unless > > > absolutely necessary. This means that users like us can run with swap > > > present for emergency use, rather than having to disable it because of the > > > side effects. > > > > Agree. And this ooperation mode should be the default behavior given that > > swapping is a very slow and tedious process these days. > > Even though current patch is not optimal, I don't disagree this opinion. Can > you please explain your use case? Why don't you use swapoff? My use case is that I have large (64 or 128GB RAM) qemu-kvm virtualization hosts, running many (20-50) VMs. Typically the total memory in use is less than physical memory. In these cases I would like the virtualization host to run without any swapping. I have set swappiness==0, but in practise I get big load spikes from swapping. See http://marc.info/?l=linux-mm&m=133517452117581 I don't want to run swapoff, because sometimes I will need to provision slightly more VMs than physical memory, and in these cases I would rather that the system runs with a little swap in use rather than the OOM killer occurring. Richard. |
|
From: Christoph L. <cl...@li...> - 2012-04-26 18:20:22
|
On Thu, 26 Apr 2012, KOSAKI Motohiro wrote: > (4/26/12 10:50 AM), Christoph Lameter wrote: > > On Tue, 24 Apr 2012, Richard Davies wrote: > > > > > I strongly believe that Linux should have a way to turn off swapping > > > unless > > > absolutely necessary. This means that users like us can run with swap > > > present for emergency use, rather than having to disable it because of the > > > side effects. > > > > Agree. And this ooperation mode should be the default behavior given that > > swapping is a very slow and tedious process these days. > > Even though current patch is not optimal, I don't disagree this opinion. Can > you please explain your use case? Why don't you use swapoff? Because I do not want to have systems go OOM. In an emergency lets use swap (and maybe generate some sort of alert if that happens). > Off topic: I hope linux is going to aim good swap clustered io in future. > Especially > when using THP, 4k size io is not really good. Swap to regular disks is going to be an ever greater problem since the access speed of rotational media has not changed much whereas the processing performance of the cpu has increased significantly. There is an ever increasing gap in speed. |
|
From: Satoru M. <sat...@hd...> - 2012-04-24 22:15:09
|
On 04/24/2012 04:20 AM, Richard Davies wrote: > > I have run into problems with heavy swapping with swappiness==0 and > was pointed to this thread ( > http://marc.info/?l=linux-mm&m=133522782307215 ) Did you test this patch with your workload? If yes, how did it come out? > I strongly believe that Linux should have a way to turn off swapping > unless absolutely necessary. This means that users like us can run > with swap present for emergency use, rather than having to disable it > because of the side effects. Agreed. That is why I proposed the patch. > Personally, I feel that swappiness==0 should have this (intuitive) > meaning, and that people running RHEL5 are extremely unlikely to run > 3.5 kernels(!) > > However, swappiness==-1 or some other hack is definitely better than > no patch. Regards, Satoru |
|
From: Richard D. <ric...@el...> - 2012-04-26 14:27:09
|
Satoru Moriya wrote: > > I have run into problems with heavy swapping with swappiness==0 and > > was pointed to this thread ( > > http://marc.info/?l=linux-mm&m=133522782307215 ) > > Did you test this patch with your workload? I haven't yet tested this patch. It takes a long time since these are production machines, and the bug itself takes several weeks of production use to really show up. Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: http://marc.info/?l=linux-mm&m=133536506926326 My intention is to reboot half of our machines into plain 3.4 once it is out, and half onto 3.4 + your patch. Then we can compare behaviour. Will your patch apply cleanly on 3.4? Richard. |
|
From: KOSAKI M. <kos...@gm...> - 2012-04-26 15:42:23
|
On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies <ric...@el...> wrote: > Satoru Moriya wrote: >> > I have run into problems with heavy swapping with swappiness==0 and >> > was pointed to this thread ( >> > http://marc.info/?l=linux-mm&m=133522782307215 ) >> >> Did you test this patch with your workload? > > I haven't yet tested this patch. It takes a long time since these are > production machines, and the bug itself takes several weeks of production > use to really show up. > > Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: > http://marc.info/?l=linux-mm&m=133536506926326 > > My intention is to reboot half of our machines into plain 3.4 once it is > out, and half onto 3.4 + your patch. > > Then we can compare behaviour. > > Will your patch apply cleanly on 3.4? Note. This patch doesn't solve your issue. This patch mean, when occuring very few swap io, it change to 0. But you said you are seeing eager swap io. As Dave already pointed out, your machine have buffer head issue. So, this thread is pointless. |
|
From: Rik v. R. <ri...@re...> - 2012-05-07 20:10:13
|
On 04/26/2012 11:41 AM, KOSAKI Motohiro wrote: > On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies > <ric...@el...> wrote: >> Satoru Moriya wrote: >>>> I have run into problems with heavy swapping with swappiness==0 and >>>> was pointed to this thread ( >>>> http://marc.info/?l=linux-mm&m=133522782307215 ) >>> >>> Did you test this patch with your workload? >> >> I haven't yet tested this patch. It takes a long time since these are >> production machines, and the bug itself takes several weeks of production >> use to really show up. >> >> Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: >> http://marc.info/?l=linux-mm&m=133536506926326 >> >> My intention is to reboot half of our machines into plain 3.4 once it is >> out, and half onto 3.4 + your patch. >> >> Then we can compare behaviour. >> >> Will your patch apply cleanly on 3.4? > > Note. This patch doesn't solve your issue. This patch mean, > when occuring very few swap io, it change to 0. But you said > you are seeing eager swap io. As Dave already pointed out, your > machine have buffer head issue. > > So, this thread is pointless. Running KVM guests directly off block devices results in a lot of buffer cache. I suspect that this patch will in fact fix Richard's issue. The patch is small, fairly simple and looks like it will fix people's problems. It also makes swappiness=0 behave the way most people seem to imagine it would work. If it works for a few people (test results), I believe we might as well merge it. Yes, for cgroups we may need additional logic, but we can sort that out as we go along. -- All rights reversed |
|
From: Minchan K. <mi...@ke...> - 2012-05-08 00:20:25
|
On 05/08/2012 05:09 AM, Rik van Riel wrote: > On 04/26/2012 11:41 AM, KOSAKI Motohiro wrote: >> On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies >> <ric...@el...> wrote: >>> Satoru Moriya wrote: >>>>> I have run into problems with heavy swapping with swappiness==0 and >>>>> was pointed to this thread ( >>>>> http://marc.info/?l=linux-mm&m=133522782307215 ) >>>> >>>> Did you test this patch with your workload? >>> >>> I haven't yet tested this patch. It takes a long time since these are >>> production machines, and the bug itself takes several weeks of >>> production >>> use to really show up. >>> >>> Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: >>> http://marc.info/?l=linux-mm&m=133536506926326 >>> >>> My intention is to reboot half of our machines into plain 3.4 once it is >>> out, and half onto 3.4 + your patch. >>> >>> Then we can compare behaviour. >>> >>> Will your patch apply cleanly on 3.4? >> >> Note. This patch doesn't solve your issue. This patch mean, >> when occuring very few swap io, it change to 0. But you said >> you are seeing eager swap io. As Dave already pointed out, your >> machine have buffer head issue. >> >> So, this thread is pointless. > > Running KVM guests directly off block devices results in a lot > of buffer cache. > > I suspect that this patch will in fact fix Richard's issue. > > The patch is small, fairly simple and looks like it will fix > people's problems. It also makes swappiness=0 behave the way > most people seem to imagine it would work. > > If it works for a few people (test results), I believe we > might as well merge it. > > Yes, for cgroups we may need additional logic, but we can > sort that out as we go along. > I agree Rik's opinion absolutely. -- Kind regards, Minchan Kim |
|
From: Richard D. <ric...@el...> - 2012-05-21 07:12:41
|
Hi Satoru, Rik van Riel wrote: > KOSAKI Motohiro wrote: > > Richard Davies wrote: > > >Satoru Moriya wrote: > > > > > I have run into problems with heavy swapping with swappiness==0 and > > > > > was pointed to this thread ( > > > > > http://marc.info/?l=linux-mm&m=133522782307215 ) > > > > > > > > Did you test this patch with your workload? > > > > > > I haven't yet tested this patch. It takes a long time since these are > > > production machines, and the bug itself takes several weeks of > > > production use to really show up. > > > > > > Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4: > > > http://marc.info/?l=linux-mm&m=133536506926326 > > > > > > My intention is to reboot half of our machines into plain 3.4 once it > > > is out, and half onto 3.4 + your patch. > > > > > > Then we can compare behaviour. > > > > > > Will your patch apply cleanly on 3.4? > > > > Note. This patch doesn't solve your issue. This patch mean, > > when occuring very few swap io, it change to 0. But you said > > you are seeing eager swap io. As Dave already pointed out, your > > machine have buffer head issue. > > > > So, this thread is pointless. > > Running KVM guests directly off block devices results in a lot > of buffer cache. > > I suspect that this patch will in fact fix Richard's issue. > > The patch is small, fairly simple and looks like it will fix > people's problems. It also makes swappiness=0 behave the way > most people seem to imagine it would work. > > If it works for a few people (test results), I believe we > might as well merge it. > > Yes, for cgroups we may need additional logic, but we can > sort that out as we go along. Now that 3.4 is out with Rik's fixes, I'm keen to start testing with and without this extra patch. Satoru - should I just apply your original patch (most likely), or do you need to update for the final released kernel? Thanks, Richard. |
|
From: Satoru M. <sat...@hd...> - 2012-05-21 13:39:47
|
Hi Richard, On 05/21/2012 03:12 AM, Richard Davies wrote: > Now that 3.4 is out with Rik's fixes, I'm keen to start testing with > and without this extra patch. > > Satoru - should I just apply your original patch (most likely), or do > you need to update for the final released kernel? Thank you for testing! I believe you can apply the patch without any updates. Regards, Satoru |
|
From: Satoru M. <sat...@hd...> - 2012-03-30 22:44:45
|
Hello Kosaki-san, On 03/07/2012 01:18 PM, Satoru Moriya wrote: > On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote: >> Thank you. I brought back to memory it. Unfortunately DB folks are >> still mainly using RHEL5 generation distros. At that time, >> swapiness=0 doesn't mean disabling swap. >> >> They want, "don't swap as far as kernel has any file cache page". but >> linux don't have such feature. then they used swappiness for emulate >> it. So, I think this patch clearly make userland harm. Because of, we >> don't have an alternative way. As I wrote in the previous mail(see below), with this patch the kernel begins to swap out when the sum of free pages and filebacked pages reduces less than watermark_high. So the kernel reclaims pages like following. nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages nr_free + nr_filebacked < watermark_high: reclaim only anonymous pages Do you think this behavior satisfies DB users' requirement? > If they expect the behavior that "don't swap as far as kernel has any > file cache page", this patch definitely helps them because if we set > swappiness==0, kernel does not swap out > *until* nr_free + nr_filebacked < high watermark in the zone. > It means kernel begins to swap out when nr_free + nr_filebacked > becomes less than high watermark. > > But, yes, this patch actually changes the behavior with swappiness==0 > and so it may make userland harm. > > How about introducing new value e.g -1 to avoid swap and maintain > compatibility? Regards, Satoru |
|
From: KOSAKI M. <kos...@jp...> - 2012-04-02 17:10:42
|
2012/3/30 Satoru Moriya <sat...@hd...>: > Hello Kosaki-san, > > On 03/07/2012 01:18 PM, Satoru Moriya wrote: >> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote: >>> Thank you. I brought back to memory it. Unfortunately DB folks are >>> still mainly using RHEL5 generation distros. At that time, >>> swapiness=0 doesn't mean disabling swap. >>> >>> They want, "don't swap as far as kernel has any file cache page". but >>> linux don't have such feature. then they used swappiness for emulate >>> it. So, I think this patch clearly make userland harm. Because of, we >>> don't have an alternative way. > > As I wrote in the previous mail(see below), with this patch > the kernel begins to swap out when the sum of free pages and > filebacked pages reduces less than watermark_high. > > So the kernel reclaims pages like following. > > nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages > nr_free + nr_filebacked < watermark_high: reclaim only anonymous pages How? |
|
From: Jerome M. <jma...@re...> - 2012-04-03 11:25:39
|
On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
> 2012/3/30 Satoru Moriya <sat...@hd...>:
>> Hello Kosaki-san,
>>
>> On 03/07/2012 01:18 PM, Satoru Moriya wrote:
>>> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>>>> Thank you. I brought back to memory it. Unfortunately DB folks are
>>>> still mainly using RHEL5 generation distros. At that time,
>>>> swapiness=0 doesn't mean disabling swap.
>>>>
>>>> They want, "don't swap as far as kernel has any file cache page". but
>>>> linux don't have such feature. then they used swappiness for emulate
>>>> it. So, I think this patch clearly make userland harm. Because of, we
>>>> don't have an alternative way.
>>
>> As I wrote in the previous mail(see below), with this patch
>> the kernel begins to swap out when the sum of free pages and
>> filebacked pages reduces less than watermark_high.
Actually, this is true only for global reclaims. Reclaims in cgroup can fail
in this case.
>>
>> So the kernel reclaims pages like following.
>>
>> nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages
>> nr_free + nr_filebacked < watermark_high: reclaim only anonymous pages
>
> How?
get_scan_count() checks that case explicitly:
if (global_reclaim(sc)) {
free = zone_page_state(mz->zone, NR_FREE_PAGES);
/* If we have very few page cache pages,
force-scan anon pages. */
if (unlikely(file + free <= high_wmark_pages(mz->zone))) {
fraction[0] = 1;
fraction[1] = 0;
denominator = 1;
goto out;
}
}
Regards,
Jerome
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to maj...@vg...
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
|