From: Kelly L. <luc...@fa...> - 2005-08-23 22:09:36
|
I agree, as I almost always get test case failures when testing this specific testcase. kdl Kelly D. Lucas luc...@fa... Sripathi Kodi wrote: > Hi, > > Please copy me on replies as I have not subscribed to this list. > > I am seeing nanosleep02 testcase failure on RHEL3 machines (kernel > 2.4.21-32.11.EL and other RHEL3 kernels). I believe the accuracy > expected out of 'nanosleep' call in this testcase is too high and that > is the cause of the problem. I have explained the reasons and provided > a patch for the testcase below. Failure looks like this: > > nanosleep02 1 FAIL : Remaining sleep time 4010000 usec doesn't > match with the expected 3999622 usec time > nanosleep02 1 FAIL : child process exited abnormally > > Explanation: > ------------ > nanosleep02 testcase does the following: > gettimeofday - before > nanosleep. - Interrupt the sleeping process by sending SIGINT > gettimeofday - after > > It then compares the time remaining returned by nanosleep with the > difference in time shown by the two gettimeofday calls. It expects the > difference to be less than 2000 microseconds. > > Compare this to nanosleep01 testcase. It is a simple test that lets > the process sleep with nanosleep and checks how long it has slept. The > SLOP_MS value, which is the allowed error margin, is 250 milliseconds. > It does all calculations and comparisons in milliseconds. > > Whereas nanosleep02 does all calculations in microseconds. However, > 'nanosleep' call is limited by the kernel timer mechanism, which can > afford an accuracy of 1/HZ at best. From the manpage of 'nanosleep': > "The current implementation of nanosleep is based on the normal kernel > timer mechanism, which has a resolution of 1/HZ s (i.e, 10 ms on > Linux/i386 and 1ms on Linux/Alpha). Therefore, nanosleep pauses always > for at least the specified time, however it can take up to 10 ms > longer than specified until the process becomes runnable again. > For the same reason, the value returned in case of a delivered > signal in *rem is usually rounded to the next larger multiple of 1/HZ s." > > So while looking at the remaining time returned by an interrupted > 'nanosleep' call, the error margin we should be ready to allow is > (1/HZ * 2), which is 20milliseconds. In fact 'nanosleep01' is quite > generous! > > The behavior shown by 'nanosleep' call is acceptable by POSIX > standards as well. It says > (http://www.opengroup.org/onlinepubs/007908799/xsh/nanosleep.html): > "The suspension time may be longer than requested because the argument > value is rounded up to an integer multiple of the sleep resolution or > because of the scheduling of other activity by the system. But, except > for the case of being interrupted by a signal, the suspension time > will not be less than the time specified by rqtp, as measured by the > system clock, CLOCK_REALTIME." > > So I think we should change the testcase. I am attaching a patch to > suggest changes for this. Please note that I have set the error margin > to be same as nanosleep01, which is 250milliseconds. It can be set as > low as 20 milliseconds, but if the system is heavily loaded, it may > lead to the test failing again. > > > Thanks and regards, > Sripathi. > > Patch: > ------ > > --- testcases/kernel/syscalls/nanosleep/nanosleep02.c 2005-08-02 > 13:48:29.000000000 -0500 > +++ /home/sripathi/17215/nanosleep02.c 2005-08-02 > 13:40:38.000000000 -0500 > @@ -101,7 +101,7 @@ void sig_handler(); /* signal catching > * the "rem" field would never change without the increased > * usec precision in the -aa tree. > */ > - #define USEC_PRECISION 2200 /* Originally set at 100 max but this > compiler bug has been around for years. */ > +#define MSEC_PRECISION 250 /* Error margin allowed in > milliseconds */ > > int > main(int ac, char **av) > @@ -185,7 +185,7 @@ main(int ac, char **av) > void > do_child() > { > - unsigned long req, rem, before, after, elapsed; /* usec */ > + unsigned long req, rem, before, after, elapsed; /* msec */ > struct timeval otime; /* time before child execution > suspended */ > struct timeval ntime; /* time after child resumes > execution */ > > @@ -208,15 +208,15 @@ do_child() > * The time remaining should be equal to the > * Total time for sleep - time spent on sleep bfr signal > */ > - req = timereq.tv_sec * 1000000 + timereq.tv_nsec / 1000; > - rem = timerem.tv_sec * 1000000 + timerem.tv_nsec / 1000; > - before = otime.tv_sec * 1000000 + otime.tv_usec; > - after = ntime.tv_sec * 1000000 + ntime.tv_usec; > + req = timereq.tv_sec * 1000 + timereq.tv_nsec / 1000000; > + rem = timerem.tv_sec * 1000 + timerem.tv_nsec / 1000000; > + before = otime.tv_sec * 1000 + otime.tv_usec/1000; > + after = ntime.tv_sec * 1000 + ntime.tv_usec/1000; > elapsed = after - before; > > - if (rem - (req - elapsed) > USEC_PRECISION) { > - tst_resm(TFAIL, "Remaining sleep time %lu usec doesn't " > - "match with the expected %lu usec time", > + if (rem - (req - elapsed) > MSEC_PRECISION) { > + tst_resm(TFAIL, "Remaining sleep time %lu msec doesn't " > + "match with the expected %lu msec time", > rem, (req - elapsed)); > exit(1); > } > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Ltp-list mailing list > Ltp...@li... > https://lists.sourceforge.net/lists/listinfo/ltp-list |