Thread: Re: [Dle-develop] [RFC][PATCH] avoid swapping out with swappiness==0 (Page 2)

Brought to you by: h-aoki, hiramatu, s-oshima, y-sugita

dle-develop

Re: [Dle-develop] [RFC][PATCH] avoid swapping out with swappiness==0

From: Satoru M. <sat...@hd...> - 2012-04-03 15:18:10

On 04/03/2012 07:25 AM, Jerome Marchand wrote:
> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>> 2012/3/30 Satoru Moriya <sat...@hd...>:
>>> Hello Kosaki-san,
>>>
>>> On 03/07/2012 01:18 PM, Satoru Moriya wrote:
>>>> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>>>>> Thank you. I brought back to memory it. Unfortunately DB folks are
>>>>> still mainly using RHEL5 generation distros. At that time,
>>>>> swapiness=0 doesn't mean disabling swap.
>>>>>
>>>>> They want, "don't swap as far as kernel has any file cache page". but
>>>>> linux don't have such feature. then they used swappiness for emulate
>>>>> it. So, I think this patch clearly make userland harm. Because of, we
>>>>> don't have an alternative way.
>>>
>>> As I wrote in the previous mail(see below), with this patch
>>> the kernel begins to swap out when the sum of free pages and
>>> filebacked pages reduces less than watermark_high.
>
> Actually, this is true only for global reclaims. Reclaims in cgroup can fail
> in this case.

Right.
As long as we consider RHEL5 users above, I believe they don't care
about cgroup case.

>>>
>>> So the kernel reclaims pages like following.
>>>
>>> nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages
>>> nr_free + nr_filebacked <  watermark_high: reclaim only anonymous pages

I made a tiny mistake.
Correct one is following ;p

nr_free + nr_filebacked >  watermark_high: reclaim only filebacked pages
nr_free + nr_filebacked <= watermark_high: reclaim only anonymous pages

>> How?
>
> get_scan_count() checks that case explicitly:
>
>        if (global_reclaim(sc)) {
>                free  = zone_page_state(mz->zone, NR_FREE_PAGES);
>                /* If we have very few page cache pages,
>                   force-scan anon pages. */
>                if (unlikely(file + free <= high_wmark_pages(mz->zone))) {
>                        fraction[0] = 1;
>                        fraction[1] = 0;
>                        denominator = 1;
>                        goto out;
>                }
>        }

Regards,
Satoru

Re: [Dle-develop] [RFC][PATCH] avoid swapping out with swappiness==0

From: KOSAKI M. <kos...@gm...> - 2012-04-04 17:38:26

(4/3/12 4:25 AM), Jerome Marchand wrote:
> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>> 2012/3/30 Satoru Moriya<sat...@hd...>:
>>> Hello Kosaki-san,
>>>
>>> On 03/07/2012 01:18 PM, Satoru Moriya wrote:
>>>> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>>>>> Thank you. I brought back to memory it. Unfortunately DB folks are
>>>>> still mainly using RHEL5 generation distros. At that time,
>>>>> swapiness=0 doesn't mean disabling swap.
>>>>>
>>>>> They want, "don't swap as far as kernel has any file cache page". but
>>>>> linux don't have such feature. then they used swappiness for emulate
>>>>> it. So, I think this patch clearly make userland harm. Because of, we
>>>>> don't have an alternative way.
>>>
>>> As I wrote in the previous mail(see below), with this patch
>>> the kernel begins to swap out when the sum of free pages and
>>> filebacked pages reduces less than watermark_high.
>
> Actually, this is true only for global reclaims. Reclaims in cgroup can fail
> in this case.
>
>>>
>>> So the kernel reclaims pages like following.
>>>
>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>> nr_free + nr_filebacked<   watermark_high: reclaim only anonymous pages
>>
>> How?
>
> get_scan_count() checks that case explicitly:
>
> 	if (global_reclaim(sc)) {
> 		free  = zone_page_state(mz->zone, NR_FREE_PAGES);
> 		/* If we have very few page cache pages,
> 		   force-scan anon pages. */
> 		if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
> 			fraction[0] = 1;
> 			fraction[1] = 0;
> 			denominator = 1;
> 			goto out;
> 		}
> 	}

Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
so, high watermask is not good threshold anymore.

Re: [Dle-develop] [RFC][PATCH] avoid swapping out with swappiness==0

From: Satoru M. <sat...@hd...> - 2012-04-21 00:21:51

Hi,

Sorry for my late reply.

On 04/04/2012 01:38 PM, KOSAKI Motohiro wrote:
> (4/3/12 4:25 AM), Jerome Marchand wrote:
>> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>>> 2012/3/30 Satoru Moriya<sat...@hd...>:
>>>> So the kernel reclaims pages like following.
>>>>
>>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>>> nr_free + nr_filebacked<   watermark_high: reclaim only anonymous pages
>>>
>>> How?
>>
>> get_scan_count() checks that case explicitly:
>>
>>     if (global_reclaim(sc)) {
>>         free  = zone_page_state(mz->zone, NR_FREE_PAGES);
>>         /* If we have very few page cache pages,
>>            force-scan anon pages. */
>>         if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
>>             fraction[0] = 1;
>>             fraction[1] = 0;
>>             denominator = 1;
>>             goto out;
>>         }
>>     }
> 
> Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
> so, high watermask is not good threshold anymore.

Ah yes, it is not so small now.
On 4GB server, without THP min_free_kbytes is 8113 but
with THP it is 67584.

How about using low watermark or min watermark?
Are they still big?

...or should we use other value? 

Regards,
Satoru

Re: [Dle-develop] [RFC][PATCH] avoid swapping out with swappiness==0

From: Rik v. R. <ri...@re...> - 2012-05-07 20:11:42

On 03/02/2012 12:36 PM, Satoru Moriya wrote:
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
>
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
>
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked<  high watermark)).

> Signed-off-by: Satoru Moriya<sat...@hd...>

Acked-by: Rik van Riel <ri...@re...>

-- 
All rights reversed

Re: [Dle-develop] [RFC][PATCH] avoid swapping out with swappiness==0

From: Satoru M. <sat...@hd...> - 2012-05-11 21:11:52

On 04/20/2012 08:21 PM, Satoru Moriya wrote:
> On 04/04/2012 01:38 PM, KOSAKI Motohiro wrote:
>> (4/3/12 4:25 AM), Jerome Marchand wrote:
>>> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>>>> 2012/3/30 Satoru Moriya<sat...@hd...>:
>>>>> So the kernel reclaims pages like following.
>>>>>
>>>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>>>> nr_free + nr_filebacked<   watermark_high: reclaim only anonymous pages
>>>>
>>>> How?
>>>
>>> get_scan_count() checks that case explicitly:
>>>
>>>     if (global_reclaim(sc)) {
>>>         free  = zone_page_state(mz->zone, NR_FREE_PAGES);
>>>         /* If we have very few page cache pages,
>>>            force-scan anon pages. */
>>>         if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
>>>             fraction[0] = 1;
>>>             fraction[1] = 0;
>>>             denominator = 1;
>>>             goto out;
>>>         }
>>>     }
>>
>> Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
>> so, high watermask is not good threshold anymore.
> 
> Ah yes, it is not so small now.
> On 4GB server, without THP min_free_kbytes is 8113 but with THP it is 
> 67584.
> 
> How about using low watermark or min watermark?
> Are they still big?
> 
> ...or should we use other value? 

What do you think of the idea above?

By the way, I'd like to discuss this topic in other thread
because discussion about optimal threshold where the kernel
changes its reclaim policy does not affect only swappiness==0
case but also all other settings. 

So, I propose that we start with applying this patch first
and then discuss/improve the threshold.

The patch may not be perfect but, at least, we can improve
the kernel behavior in the enough filebacked memory case
with this patch. I believe it's better than nothing.

Regards,
Satoru

Re: [Dle-develop] [RFC][PATCH] avoid swapping out with swappiness==0

From: Rik v. R. <ri...@re...> - 2012-05-12 22:21:27

On 05/11/2012 05:11 PM, Satoru Moriya wrote:
> On 04/20/2012 08:21 PM, Satoru Moriya wrote:
>> Ah yes, it is not so small now.
>> On 4GB server, without THP min_free_kbytes is 8113 but with THP it is
>> 67584.
>>
>> How about using low watermark or min watermark?
>> Are they still big?
>>
>> ...or should we use other value?
>
> What do you think of the idea above?

I believe that using the high watermark is just fine.

We want to start swapping, before the page cache is so
small that we start thrashing from that.

> So, I propose that we start with applying this patch first
> and then discuss/improve the threshold.
>
> The patch may not be perfect but, at least, we can improve
> the kernel behavior in the enough filebacked memory case
> with this patch. I believe it's better than nothing.

Agreed.

-- 
All rights reversed

[Dle-develop] [PATCH RESEND] avoid swapping out with swappiness==0

From: Satoru M. <sat...@hd...> - 2012-05-23 20:45:34

Hi Andrew,

This patch has been reviewed for couple of months.

This patch *only* improves the behavior when the kernel has
enough filebacked pages. It means that it does not change
the behavior when kernel has small number of filebacked pages.

Kosaki-san pointed out that the threshold which we use
to decide whether filebacked page is enough or not is not
appropriate(*).

(*) http://www.spinics.net/lists/linux-mm/msg32380.html

As I described in (**), I believe that threshold discussion
should be done in other thread because it affects not only
swappiness=0 case and the kernel behave the same way with
or without this patch below the threshold.

(**) http://www.spinics.net/lists/linux-mm/msg34317.html

The patch may not be perfect but, at least, we can improve
the kernel behavior in the enough filebacked memory case
with this patch. I believe it's better than nothing.

Do you have any comments about it?

NOTE: I updated the patch with Acked-by tags

---
Sometimes we'd like to avoid swapping out anonymous memory
in particular, avoid swapping out pages of important process or
process groups while there is a reasonable amount of pagecache
on RAM so that we can satisfy our customers' requirements.

OTOH, we can control how aggressive the kernel will swap memory pages
with /proc/sys/vm/swappiness for global and
/sys/fs/cgroup/memory/memory.swappiness for each memcg.

But with current reclaim implementation, the kernel may swap out
even if we set swappiness==0 and there is pagecache on RAM.

This patch changes the behavior with swappiness==0. If we set
swappiness==0, the kernel does not swap out completely
(for global reclaim until the amount of free pages and filebacked
pages in a zone has been reduced to something very very small
(nr_free + nr_filebacked < high watermark)).

Any comments are welcome.

Regards,
Satoru Moriya

Signed-off-by: Satoru Moriya <sat...@hd...>
Acked-by: Minchan Kim <mi...@ke...>
Acked-by: Rik van Riel <ri...@re...>

---
 mm/vmscan.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33dc256..52d64bf 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 * proportional to the fraction of recently scanned pages on
 	 * each list that were recently referenced and in active use.
 	 */
-	ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
+	ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
 	ap /= reclaim_stat->recent_rotated[0] + 1;
 
-	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
+	fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
 	spin_unlock_irq(&mz->zone->lru_lock);
 
@@ -1999,7 +1999,7 @@ out:
 		unsigned long scan;
 
 		scan = zone_nr_lru_pages(mz, lru);
-		if (priority || noswap) {
+		if (priority || noswap || !vmscan_swappiness(mz, sc)) {
 			scan >>= priority;
 			if (!scan && force_scan)
 				scan = SWAP_CLUSTER_MAX;
--
1.7.6.5

Re: [Dle-develop] [PATCH RESEND] avoid swapping out with swappiness==0

From: Rik v. R. <ri...@re...> - 2012-05-23 21:46:27

On 05/23/2012 04:41 PM, Satoru Moriya wrote:

> The patch may not be perfect but, at least, we can improve
> the kernel behavior in the enough filebacked memory case
> with this patch. I believe it's better than nothing.

Agreed.

> Do you have any comments about it?

Only one comment, and it's for Andrew :)

> Signed-off-by: Satoru Moriya<sat...@hd...>
> Acked-by: Minchan Kim<mi...@ke...>
> Acked-by: Rik van Riel<ri...@re...>

Andrew, you can turn my Acked-by into a

Reviewed-by: Rik van Riel<ri...@re...>

This is functionality that many people seem to want, and
will not break anything current users typically do.

-- 
All rights reversed

Re: [Dle-develop] [PATCH RESEND] avoid swapping out with swappiness==0

From: Jerome M. <jma...@re...> - 2012-05-24 09:16:12

On 05/23/2012 10:41 PM, Satoru Moriya wrote:
> Hi Andrew,
> 
> This patch has been reviewed for couple of months.
> 
> This patch *only* improves the behavior when the kernel has
> enough filebacked pages. It means that it does not change
> the behavior when kernel has small number of filebacked pages.
> 
> Kosaki-san pointed out that the threshold which we use
> to decide whether filebacked page is enough or not is not
> appropriate(*).
> 
> (*) http://www.spinics.net/lists/linux-mm/msg32380.html
> 
> As I described in (**), I believe that threshold discussion
> should be done in other thread because it affects not only
> swappiness=0 case and the kernel behave the same way with
> or without this patch below the threshold.
> 
> (**) http://www.spinics.net/lists/linux-mm/msg34317.html
> 
> The patch may not be perfect but, at least, we can improve
> the kernel behavior in the enough filebacked memory case
> with this patch. I believe it's better than nothing.
> 
> Do you have any comments about it?
> 
> NOTE: I updated the patch with Acked-by tags
> 
> ---
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
> 
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
> 
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
> 
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked < high watermark)).
> 
> Any comments are welcome.
> 
> Regards,
> Satoru Moriya
> 
> Signed-off-by: Satoru Moriya <sat...@hd...>
> Acked-by: Minchan Kim <mi...@ke...>
> Acked-by: Rik van Riel <ri...@re...>
> 

Acked-by: Jerome Marchand <jma...@re...>

<< < 1 2 (Page 2 of 2)