From: Sripathi K. <sri...@in...> - 2005-08-23 20:00:37
|
Hi, Please copy me on replies as I have not subscribed to this list. I am seeing nanosleep02 testcase failure on RHEL3 machines (kernel 2.4.21-32.11.EL and other RHEL3 kernels). I believe the accuracy expected out of 'nanosleep' call in this testcase is too high and that is the cause of the problem. I have explained the reasons and provided a patch for the testcase below. Failure looks like this: nanosleep02 1 FAIL : Remaining sleep time 4010000 usec doesn't match with the expected 3999622 usec time nanosleep02 1 FAIL : child process exited abnormally Explanation: ------------ nanosleep02 testcase does the following: gettimeofday - before nanosleep. - Interrupt the sleeping process by sending SIGINT gettimeofday - after It then compares the time remaining returned by nanosleep with the difference in time shown by the two gettimeofday calls. It expects the difference to be less than 2000 microseconds. Compare this to nanosleep01 testcase. It is a simple test that lets the process sleep with nanosleep and checks how long it has slept. The SLOP_MS value, which is the allowed error margin, is 250 milliseconds. It does all calculations and comparisons in milliseconds. Whereas nanosleep02 does all calculations in microseconds. However, 'nanosleep' call is limited by the kernel timer mechanism, which can afford an accuracy of 1/HZ at best. From the manpage of 'nanosleep': "The current implementation of nanosleep is based on the normal kernel timer mechanism, which has a resolution of 1/HZ s (i.e, 10 ms on Linux/i386 and 1ms on Linux/Alpha). Therefore, nanosleep pauses always for at least the specified time, however it can take up to 10 ms longer than specified until the process becomes runnable again. For the same reason, the value returned in case of a delivered signal in *rem is usually rounded to the next larger multiple of 1/HZ s." So while looking at the remaining time returned by an interrupted 'nanosleep' call, the error margin we should be ready to allow is (1/HZ * 2), which is 20milliseconds. In fact 'nanosleep01' is quite generous! The behavior shown by 'nanosleep' call is acceptable by POSIX standards as well. It says (http://www.opengroup.org/onlinepubs/007908799/xsh/nanosleep.html): "The suspension time may be longer than requested because the argument value is rounded up to an integer multiple of the sleep resolution or because of the scheduling of other activity by the system. But, except for the case of being interrupted by a signal, the suspension time will not be less than the time specified by rqtp, as measured by the system clock, CLOCK_REALTIME." So I think we should change the testcase. I am attaching a patch to suggest changes for this. Please note that I have set the error margin to be same as nanosleep01, which is 250milliseconds. It can be set as low as 20 milliseconds, but if the system is heavily loaded, it may lead to the test failing again. Thanks and regards, Sripathi. Patch: ------ --- testcases/kernel/syscalls/nanosleep/nanosleep02.c 2005-08-02 13:48:29.000000000 -0500 +++ /home/sripathi/17215/nanosleep02.c 2005-08-02 13:40:38.000000000 -0500 @@ -101,7 +101,7 @@ void sig_handler(); /* signal catching * the "rem" field would never change without the increased * usec precision in the -aa tree. */ - #define USEC_PRECISION 2200 /* Originally set at 100 max but this compiler bug has been around for years. */ +#define MSEC_PRECISION 250 /* Error margin allowed in milliseconds */ int main(int ac, char **av) @@ -185,7 +185,7 @@ main(int ac, char **av) void do_child() { - unsigned long req, rem, before, after, elapsed; /* usec */ + unsigned long req, rem, before, after, elapsed; /* msec */ struct timeval otime; /* time before child execution suspended */ struct timeval ntime; /* time after child resumes execution */ @@ -208,15 +208,15 @@ do_child() * The time remaining should be equal to the * Total time for sleep - time spent on sleep bfr signal */ - req = timereq.tv_sec * 1000000 + timereq.tv_nsec / 1000; - rem = timerem.tv_sec * 1000000 + timerem.tv_nsec / 1000; - before = otime.tv_sec * 1000000 + otime.tv_usec; - after = ntime.tv_sec * 1000000 + ntime.tv_usec; + req = timereq.tv_sec * 1000 + timereq.tv_nsec / 1000000; + rem = timerem.tv_sec * 1000 + timerem.tv_nsec / 1000000; + before = otime.tv_sec * 1000 + otime.tv_usec/1000; + after = ntime.tv_sec * 1000 + ntime.tv_usec/1000; elapsed = after - before; - if (rem - (req - elapsed) > USEC_PRECISION) { - tst_resm(TFAIL, "Remaining sleep time %lu usec doesn't " - "match with the expected %lu usec time", + if (rem - (req - elapsed) > MSEC_PRECISION) { + tst_resm(TFAIL, "Remaining sleep time %lu msec doesn't " + "match with the expected %lu msec time", rem, (req - elapsed)); exit(1); } |