From: Jeffrey N. <jno...@ca...> - 2001-11-27 13:37:17
|
Hi guys, very cool testsuite. Wanted to let you know I am experiencing a couple of problems getting = the tests to run correctly and wondered if you know why. I am running RedHat 7.1 (kernel update to 2.4.9-13 but had same problems = on earlier kernels) on IBM x330 1GHz 2 processor machines, 1GB RAM, 2GB = swap using ltp 20011107. My basic process is just as outlined in the INSTALL file. login as root tar zxf ltp-20011107.tgz cd ltp-20011107 make ./runalltests.sh First problem: All durations are listed as big negative numbers when tests end. I have = this isolated to pan.c line 1159. The call to time(&t) should be = outside the if (logfile !=3D NULL) { statement since I am simply logging = to console at the moment. Second problem: hang (I think) at ioctl02 test. I have waited up to several hours for this test to complete but it never = does. I am not knowledgeable enough with ioctls or ttys to be able to = help on this one. But pan reports the command line as: ioctl02 -D /dev/tty0 and ps -ef shows two instances running. Fortunately, I have several of these machines available so I can just = leave the test running (it's not using any CPU or anything). If you = need any further information from me, just let me know and I'll see what = I can get for you. Jeff--- |
From: Paul L. <pl...@au...> - 2001-11-27 14:07:05
|
On Tue, 2001-11-27 at 13:39, Jeffrey Nowland wrote: > First problem: > All durations are listed as big negative numbers when tests end. I have this isolated to pan.c line 1159. The call to time(&t) should be outside the if (logfile != NULL) { statement since I am simply logging to console at the moment. Odd that nobody ever noticed that before. I guess nobody here cares about time! :) Thanks, I'll fix that up and it'll be in the next release. > Second problem: > hang (I think) at ioctl02 test. > I have waited up to several hours for this test to complete but it never does. I am not knowledgeable enough with ioctls or ttys to be able to help on this one. But pan reports the command line as: > That test should not take several hours, it should run through very quickly. Does ps show it in D state or anything? Are you running this test over NFS? Thanks, Paul Larson |
From: Jeffrey N. <jno...@ca...> - 2001-11-27 15:59:23
|
Ok, this is one of those cases where having the linux source is very cool. Since I don't know much about ioctl or tty I figured I'd go learn. So here's the deal with ioctl02. I did an strace on the child process and got a rapid ERESTARTSYS loop. Bummer. Looking at kernel source I came across the one place in the tty code where ERESTARTSYS is returned and why (and more importantly what I can do about it). If you write to a tty and you are either a bacground process or don't own the tty and if you are not ignoring or blocking SIGTTOU, then the signal is delivered to your process and the sys call is restarted. So, by adding signal(SIGTTOU, SIG_IGN) just prior to the ioctl(..TCFLSH) call the test now passes. What I don't know is if this is the right thing to do or not. ----- Original Message ----- From: "Paul Larson" <pl...@au...> To: "Jeffrey Nowland" <jno...@ca...> Cc: "ltp" <ltp...@li...> Sent: Tuesday, November 27, 2001 2:10 AM Subject: Re: [LTP] Nice testsuite (couple of problems) > On Tue, 2001-11-27 at 13:39, Jeffrey Nowland wrote: > > First problem: > > All durations are listed as big negative numbers when tests end. I have this isolated to pan.c line 1159. The call to time(&t) should be outside the if (logfile != NULL) { statement since I am simply logging to console at the moment. > Odd that nobody ever noticed that before. I guess nobody here cares > about time! :) Thanks, I'll fix that up and it'll be in the next > release. > > > Second problem: > > hang (I think) at ioctl02 test. > > I have waited up to several hours for this test to complete but it never does. I am not knowledgeable enough with ioctls or ttys to be able to help on this one. But pan reports the command line as: > > > That test should not take several hours, it should run through very > quickly. Does ps show it in D state or anything? Are you running this > test over NFS? > > Thanks, > Paul Larson > > > |
From: Paul L. <pl...@au...> - 2001-11-30 16:42:31
|
On Tue, 2001-11-27 at 13:39, Jeffrey Nowland wrote: > First problem: > All durations are listed as big negative numbers when tests end. I have this isolated to pan.c line 1159. The call to time(&t) should be outside the if (logfile != NULL) { statement since I am simply logging to console at the moment. > I've taken a look at all my log files from way back and I now know why we havn't caught this before. I've never even seen this happen. Can you reproduce it? I don't see how putting the call to time(&t) outside the if statement would help any. I did notice something odd though, in your proposed fix, you said it should be at line #1159. The current copy of pan.c is only 1158 lines long though and the line you were talking about it more like 597. Have you modified it in some way? Has anyone else seen negative numbers for the duration? -Paul Larson |
From: Paul L. <pl...@au...> - 2001-11-30 19:35:39
|
On Fri, 2001-11-30 at 17:11, Jeffrey Nowland wrote: > If you look at the runalltests.sh script as it unpacks, there is no -l file > on the pan command line so t is effectively 0. A little later at line 617 > you pass the uninitialized (if logfile == NULL) t to write_test_end which I > think is the guy who mis-states the duration. Oh, I was looking in the wrong place. Because of that line, I assumed you were getting the negative values in the duration field of the logfile, not the output. I took the -l out of my script and you are absolutely correct. Thanks for the fix, the version in cvs has been updated. -Paul Larson |
From: Jeffrey N. <jno...@ca...> - 2001-11-30 19:46:19
|
Excellent. Did you get a chance to look at the SIGTTOU thing I sent for the ioctl02 test. I am not sure if I am fixing the test or hiding a kernel bug with that one. If you don't remember what I'm talking about, my original message also had a hang problem when running the ioctl02 test from runalltests.sh I traced things down with strace and found line 379 doing if(ioctl(cfd, TCFLUSH, 2) < 0) was stuck in an ERESTARTSYS loop in the kernel. A little investigation of the kernel code showed me that it was trying to signal SIGTTOU because the child was either backgrounded or didn't own the tty that was being flushed (/dev/tty0 in this case). I noticed that if SIGTTOU is either blocked or ignored that the signal would not be generated and the flush would take place as planned. I added a line signal(SIGTTOU, SIG_IGN) just prior to that line and the test completes satifactorily. I think I mentioned I know nothing about ioctl or tty devices so I haven't the foggiest whether or not what I did fixed the test or masked a kernel bug. Jeff--- ----- Original Message ----- From: "Paul Larson" <pl...@au...> To: "ltp" <ltp...@li...> Sent: Friday, November 30, 2001 7:41 AM Subject: Re: [LTP] Nice testsuite (couple of problems) > On Fri, 2001-11-30 at 17:11, Jeffrey Nowland wrote: > > If you look at the runalltests.sh script as it unpacks, there is no -l file > > on the pan command line so t is effectively 0. A little later at line 617 > > you pass the uninitialized (if logfile == NULL) t to write_test_end which I > > think is the guy who mis-states the duration. > Oh, I was looking in the wrong place. Because of that line, I assumed > you were getting the negative values in the duration field of the > logfile, not the output. I took the -l out of my script and you are > absolutely correct. Thanks for the fix, the version in cvs has been > updated. > > -Paul Larson > > > _______________________________________________ > Ltp-list mailing list > Ltp...@li... > https://lists.sourceforge.net/lists/listinfo/ltp-list > |
From: Paul L. <pl...@au...> - 2001-11-30 20:14:07
|
On Fri, 2001-11-30 at 19:48, Jeffrey Nowland wrote: > A little investigation of the kernel code showed me that it was trying to > signal SIGTTOU because the child was either backgrounded or didn't own the > tty that was being flushed (/dev/tty0 in this case). I noticed that if > SIGTTOU is either blocked or ignored that the signal would not be generated > and the flush would take place as planned. > > I added a line signal(SIGTTOU, SIG_IGN) just prior to that line and the test > completes satifactorily. > > I think I mentioned I know nothing about ioctl or tty devices so I haven't > the foggiest whether or not what I did fixed the test or masked a kernel > bug. Yes, we've been working on reproducing this problem and have managed to find 2 machines that do it with some regularity. This one is rather confusing to me, shouldn't I be able to make it happen all the time by doing ioctl02 -D/dev/tty0 & ??? I tried that for a few times and it always worked just fine without hanging. I saw the kernel code you were talking about in tty_check_change, and the comment above it makes it look like it is an intentional POSIX behaviour that writing or setting the state of a terminal from a background process should send a SIGTTOU and return -ERESTARTSYS. The problem I'm having is that it doesn't seem to be consistant. On some machines, I never see it happen. On some, it happens at random. This is what I'm doing for now, unless someone has objections. I changed the ioctl02 test for now (in a slightly different way than you did, but still blocks the signal). I hate to have one hanging test bog down the whole test suite. Pending further investigation, maybe we need a test that simply writes to, or changes the state of a tty from a background process and expects to receive a SIGTTOU. I'd be interested in knowing where the inconsistance comes from. Thanks, Paul Larson |
From: Jeffrey N. <jno...@ca...> - 2001-11-30 20:21:32
|
Well if you need any information from me (i.e. machine configs, types, etc) just let me know.' Jeff--- ----- Original Message ----- From: "Paul Larson" <pl...@au...> To: "Jeffrey Nowland" <jno...@ca...> Cc: "ltp" <ltp...@li...> Sent: Friday, November 30, 2001 8:20 AM Subject: Re: [LTP] Nice testsuite (couple of problems) > On Fri, 2001-11-30 at 19:48, Jeffrey Nowland wrote: > > A little investigation of the kernel code showed me that it was trying to > > signal SIGTTOU because the child was either backgrounded or didn't own the > > tty that was being flushed (/dev/tty0 in this case). I noticed that if > > SIGTTOU is either blocked or ignored that the signal would not be generated > > and the flush would take place as planned. > > > > I added a line signal(SIGTTOU, SIG_IGN) just prior to that line and the test > > completes satifactorily. > > > > I think I mentioned I know nothing about ioctl or tty devices so I haven't > > the foggiest whether or not what I did fixed the test or masked a kernel > > bug. > Yes, we've been working on reproducing this problem and have managed to > find 2 machines that do it with some regularity. This one is rather > confusing to me, shouldn't I be able to make it happen all the time by > doing ioctl02 -D/dev/tty0 & ??? I tried that for a few times and it > always worked just fine without hanging. I saw the kernel code you were > talking about in tty_check_change, and the comment above it makes it > look like it is an intentional POSIX behaviour that writing or setting > the state of a terminal from a background process should send a SIGTTOU > and return -ERESTARTSYS. The problem I'm having is that it doesn't seem > to be consistant. On some machines, I never see it happen. On some, it > happens at random. > > This is what I'm doing for now, unless someone has objections. I > changed the ioctl02 test for now (in a slightly different way than you > did, but still blocks the signal). I hate to have one hanging test bog > down the whole test suite. Pending further investigation, maybe we need > a test that simply writes to, or changes the state of a tty from a > background process and expects to receive a SIGTTOU. I'd be interested > in knowing where the inconsistance comes from. > > Thanks, > Paul Larson > > > _______________________________________________ > Ltp-list mailing list > Ltp...@li... > https://lists.sourceforge.net/lists/listinfo/ltp-list > |
From: Paul L. <pl...@au...> - 2001-11-30 20:29:07
|
I just noticed something new: if (current->tty != tty) return 0; That is at the beginning of tty_check_change. So, you can reproduce it reliably by logging in to tty1 from the console for instance, and doing a ioctl02 -D /dev/tty1 & So if I'm reading this right, if current->tty and tty don't match, then it should return 0 and NOT hang. 16332 tty2 T 0:00 ioctl02 -D /dev/tty0 ^^^^---------------------------------^^^^ That sure looks wrong to me. I'll try to do some debugging on this before I push much further, but if someone out there has more insight into tty's and ioctls, feel free to chime in. In the meantime, we at least have a reliable way of reproducing the hang scenario. I'm still going to change the testcase for now though until we know more. After the new ltp goes out, just remove the sigaction call that block SIGTTOU if you want to look at this, or leave it in if you want your testsuite to run uninterrupted. Thanks, Paul Larson Thanks, Paul Larson |