From: Subrata M. <su...@li...> - 2008-05-06 14:26:31
|
On Mon, 2008-05-05 at 16:43 +0800, Lin Feng Shen wrote: > > Signed-off-by: Lin Feng Shen <she...@cn...> Thanks. Applied. Regards-- Subrata > > > > Thanks & Best regards, > > ---------- > Lin Feng Shen 沈林峰 > > Linux for System p Test, China Systems & Technology Lab > China Development Labs, Beijing Tel: 86-10-82452244 Ext. 53535 Fax: > 2312 Email: she...@cn... > Address: 5F, De Shi Building, No.9, Shangdi East Road, Haidian > District, Beijing, P.R.China 100085 > > > Subrata Modak > <su...@li...> > > 05-05-08 下午 02:41 > Please respond to > su...@li... > > > > > To > ltp-list > <ltp...@li...> > cc > Lin Feng > Shen/China/IBM@IBMCN, supriyak <sup...@in...> > Subject > [PATCH] Arbitrary > usleep time in > LTP hugeshmctl01 > results in > incorrect > execution order > > > > > > > > > Hi all, > > Please see a Problem description with hugeshmctl01 test case in LTP, > and, the corresponding solution for that: > > ================================================================= > Problem Description:Lin Feng Shen > ================================================================= > I am testing hugetlb with ltp-full-20080430. Those cases under > ${LTPROOT}/testcases/kernel/mem/hugetlb/ are executed one by one again > and > again. The test runs fine in the first a few hundreds of loops, but > after > hugeshmctl01 fails for the first time, some other cases fails a lot > too. > > ---------------- Here is the staf status ----------------- > $> /proc/sys/kernel # gss > Hostname : > Kernel : 2.6.16.60-0.17-ppc64 > Kernel Build Date : Tue Apr 22 07:28:35 UTC 2008 > Distribution : SUSE > -------- > > > BASE Start Time: Fri May 2 14:32:06 CDT 2008 > Snapshot Time: Sun May 4 03:48:38 CDT 2008 > -------- > hugemmap01 (0)-local;944;7858;8802 > hugemmap02 (0)-local;8802;0;8802 > hugemmap03 (0)-local;8801;0;8802 > hugemmap04 (0)-local;908;7893;8801 > hugeshmat01 (0)-local;945;7857;8802 > hugeshmat02 (0)-local;909;7893;8802 > hugeshmat03 (0)-local;945;7857;8802 > hugeshmctl01 (0)-local;943;7859;8802 > hugeshmctl02 (0)-local;908;7894;8802 > hugeshmctl03 (0)-local;944;7858;8802 > hugeshmdt01 (0)-local;944;7858;8802 > hugeshmget01 (0)-local;945;7857;8802 > hugeshmget02 (0)-local;8802;0;8802 > hugeshmget03 (0)-local;8802;0;8802 > hugeshmget05 (0)-local;945;7857;8802 > --pass--fail--unused > > ---------------- Here is the ltp log ---------------- > The first failure is hugeshmctl01. > > hugeshmctl01 1 FAIL : # of attaches is incorrect - 3 > hugeshmctl01 2 PASS : pid, size, # of attaches and mode are > correct - pass #2 > hugeshmctl01 3 PASS : new mode and change time are correct > hugeshmctl01 4 PASS : shared memory appears to be removed > > ------- Here is the meminfo ------- > before hugeshmctl01 fails: > > clashlp1:~ # cat /proc/meminfo | tail -4 > HugePages_Total: 32 > HugePages_Free: 32 > HugePages_Rsvd: 0 > Hugepagesize: 16384 kB > clashlp1:~ # > > after hugeshmctl01 fails: > > clashlp1:~ # cat /proc/meminfo | tail -4 > HugePages_Total: 32 > HugePages_Free: 30 > HugePages_Rsvd: 30 > Hugepagesize: 16384 kB > clashlp1:~ # > ------------------------------------- > > It seems that hugeshmctl01 doesn't free some hugetlb pages when it > fails. ps > shows that there is still an instance of hugeshmctl01 left even if > hugeshmctl01 > is not running which may attach some hugetlb pages. > ------------------------------------- > clashlp1:~ # ps ax | grep huge > 14166 pts/23 S+ 0:00 grep huge > 29360 ? S 0:00 hugeshmctl01 > clashlp1:~ # > ------------------------------------- > > The problem is due to the arbitrary usleep time in hugeshmctl01 which > results in > incorrect execution order. The intention of the sleep time is to > ensure the > children call shmat() and pause() before the parent checks shm status > and calls > stat_cleanup(). But there is no absolute assurance that this sleep > always works. > ------------ > 281 /* sleep briefly to ensure correct execution order */ > 282 usleep(250000); > ------------ > > In the failure above, the last child process forked by the parent may > not run > and call shmat() immediately after it's created. When the parent > checks shm > status, it finds only 3 child attaching the shm instead of 4, so it > reports the > failure. And then it calls stat_cleanup() to send SIGUSR1 to all > children, but > since the last child hasn't called pause() yet, SIGUSR1 is handled > before > pause(). When the last child calls pause(), since there is no further > signal to > wake it up, it sleeps forever. > ================================================================= > Patch: Lin Feng Shen > ================================================================= > patch to ensure children can receive and handle SIGUSR1 from parent in > pause() > > The patch is not to change the arbitrary usleep time since any time is > arbitrary though a large time is more acceptable. The patch is to use > sigprocmask() to block SIGUSR1 before children sleep for SIGUSR1 from > parent, > and then call sigsuspend() to unblock SIGUSR1 and sleep for SIGUSR1. > By doing > so, we may avoid the infinite sleep and keeping attached shm forever > so that > affect other hugetlb test. > > In parent process, aonther sigprocmask() is called before usleep(). > This has > the same effect of sleep more time. > > With this patch, I don't see the problem again. > -------------------------- > Kernel : 2.6.16.60-0.17-ppc64 > Kernel Build Date : Tue Apr 22 07:28:35 UTC 2008 > Distribution : SUSE > -------- > > BASE Start Time: Sun May 4 20:26:11 CDT 2008 > Snapshot Time: Mon May 5 00:05:21 CDT 2008 > -------- > hugemmap01 (0)-local;803;0;80 > hugemmap02 (0)-local;803;0;80 > hugemmap03 (0)-local;803;0;80 > hugemmap04 (0)-local;803;0;80 > hugeshmat01 (0)-local;803;0;80 > hugeshmat02 (0)-local;803;0;80 > hugeshmat03 (0)-local;803;0;80 > hugeshmctl01 (0)-local;803;0;80 > hugeshmctl02 (0)-local;803;0;80 > hugeshmctl03 (0)-local;803;0;80 > hugeshmdt01 (0)-local;803;0;80 > hugeshmget01 (0)-local;803;0;80 > hugeshmget02 (0)-local;803;0;80 > hugeshmget03 (0)-local;803;0;80 > hugeshmget05 (0)-local;803;0;80 > ================================================================= > End Description & Solution > ================================================================= > > Please review whether any one of you face the same problem and whether > the patch solves your problem too. > > Regards-- > Subrata > [attachment "05_05_2008-(she...@cn...)-hugeshmctl01.patch" > deleted by Lin Feng Shen/China/IBM] |