From: Stanislav K. <sta...@or...> - 2013-09-30 13:16:05
|
On 09/30/2013 05:04 PM, ch...@su... wrote: > Hi! >>> Hmmm, Mems_allowed* content of /proc/$test_pid/status before sending >>> SIGUSR1 to $test_pid is: >>> >>> Mems_allowed: 00000000,00000001 >>> Mems_allowed_list: 0 >>> cpuset11 1 TFAIL : hog the memory on the unexpected >>> node(FilePages_For_Nodes(KB): _0: 112 >>> _1: 0 >>> _2: 0 >>> _3: 0, Expect Nodes: 0). >>> >>> i.e. It is a failure on: >>> base_test "0" "0" "0" "0" >>> >>> So, may be, It's a testcase issue. I'm Looking more deeply. >>> >>> >> It seems that just >> /bin/echo 3 > /proc/sys/vm/drop_caches >> is not sufficient to free memory. > Right, that drops only caches that are synced to disks and could be > freed without waiting for writeback. > >> In accordance to Documentation/sysctl/vm.txt we need to invoke 'sync' >> before that. >> >> I removed all the 'sleep's from the testcases and add a single sync >> before sending SIGUSR1 to mem_hog. >> >> After that this testcase started to work reliable. > Sounds good. > > So the problem (given the symptoms) may be that the test process wasn't > moved to the desired cpuset right away because kernel needed to shuffle > the process caches and since the transition wasn't complete the memory > allocated meanwhile was stil at the wrong node? I suppose the problem is of different nature. The problem is that the caches are not dropped after performing 'dd'. And since dd is executed in root cpuset (at the beginning of the testcase where no other cpusets are defined), these caches are spread among all memory nodes. The test process is placed in desired cpuset. It reads DATAFILE (which is already in memory, since the caches are in memory), but this reading doesn't increase memory consumption. And delta (memory consumption after mem_hog's reading - memory consumption before mem_hog's reading taken on each node) is equal to zero (approximately). So the testcase fails. > > If that is so, we may also add a check that the process is moved to the > cpuset allready before starting the test. But let's proceed with the > fixes and cleanups we allready have first. > |