Thread: [SSI-devel] strange inetd/tftp behavior
Brought to you by:
brucewalker,
rogertsang
From: Christian L. <ly...@po...> - 2003-01-24 15:31:59
|
Hi all =09Many thanks for the help. Now I got a new shinning 4 machine debian cl= uster=20 :-). But there are a few small problems to fix. One of them is that after= I=20 boot one node, the next node that boots cannot get the kernel image by tf= ptp=20 until I restart the inetd daemon! I mean, I boot the master node, them no= de2,=20 them to boot node3 I have to restart inetd, and to boot node4 I have to=20 restart inetd again! =09 --=20 Christian Lyra POP-PR - RNP A program should be light and agile, its subroutines connected like a=20 string of pearls. The spirit and intent of the program should be retained= =20 throughout. There should be neither too little or too much, neither needl= ess=20 loops nor useless variables, neither lack of structure nor overwhelming=20 rigidity.=20 =09=09=09=09=09=09The Tao Of Programing |
From: Aneesh K. K.V <ane...@di...> - 2003-01-24 15:40:28
|
Hi, I guess inetd is going for a toss with SIGCLUSTER( This is the new signal we added. ) Do you find inetd running after the second node join ? Are you using the CVS version of SSI ? -aneesh On Fri, 2003-01-24 at 21:00, Christian Lyra wrote: > > Hi all > > Many thanks for the help. Now I got a new shinning 4 machine debian cluster > :-). But there are a few small problems to fix. One of them is that after I > boot one node, the next node that boots cannot get the kernel image by tfptp > until I restart the inetd daemon! I mean, I boot the master node, them node2, > them to boot node3 I have to restart inetd, and to boot node4 I have to > restart inetd again! > > > > -- > Christian Lyra > POP-PR - RNP > > A program should be light and agile, its subroutines connected like a > string of pearls. The spirit and intent of the program should be retained > throughout. There should be neither too little or too much, neither needless > loops nor useless variables, neither lack of structure nor overwhelming > rigidity. > The Tao Of Programing > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel |
From: Christian L. <ly...@po...> - 2003-01-24 15:46:37
|
Hi, =09I'm running version 0.8.0 with debian woody (downgraded from my last s= etup).=20 Inetd is running after the second node join the cluster.=20 =09 On Friday 24 January 2003 13:53, Aneesh Kumar K.V wrote: > Hi, > > I guess inetd is going for a toss with SIGCLUSTER( This is the new > signal we added. ) Do you find inetd running after the second node join > ? Are you using the CVS version of SSI ? > > -aneesh > > On Fri, 2003-01-24 at 21:00, Christian Lyra wrote: > > Hi all > > > > =09Many thanks for the help. Now I got a new shinning 4 machine debia= n > > cluster > > > > :-). But there are a few small problems to fix. One of them is that a= fter > > : I > > > > boot one node, the next node that boots cannot get the kernel image b= y > > tfptp until I restart the inetd daemon! I mean, I boot the master nod= e, > > them node2, them to boot node3 I have to restart inetd, and to boot n= ode4 > > I have to restart inetd again! > > > > > > > > -- > > Christian Lyra > > POP-PR - RNP > > > > A program should be light and agile, its subroutines connected like= a > > string of pearls. The spirit and intent of the program should be reta= ined > > throughout. There should be neither too little or too much, neither > > needless loops nor useless variables, neither lack of structure nor > > overwhelming rigidity. > > =09=09=09=09=09=09The Tao Of Programing > > > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: > > SourceForge Enterprise Edition + IBM + LinuxWorld =3D Something 2 See= ! > > http://www.vasoftware.com > > _______________________________________________ > > ssic-linux-devel mailing list > > ssi...@li... > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel --=20 Christian Lyra POP-PR - RNP Programmers that do not comprehend the Tao are always running out of ti= me=20 and space for their programs. Programmers that comprehend the Tao always = have=20 enough time and space to accomplish their goals.=20 How could it be otherwise?=20 =09=09=09=09=09=09The Tao Of Programing |
From: Christian L. <ly...@po...> - 2003-01-24 19:46:01
|
Hi,=20 =09what is a "toss with SIGCLUSTER"? I tried to transfer the kernel image= with=20 a tftp client and I can only get the image one time after the inetd is=20 started. every other try fail until inetd is restarted. This doesnt happe= ns=20 with a vannila kernel. On Friday 24 January 2003 13:53, you wrote: > Hi, > > I guess inetd is going for a toss with SIGCLUSTER( This is the new > signal we added. ) Do you find inetd running after the second node join > ? Are you using the CVS version of SSI ? > > -aneesh > > On Fri, 2003-01-24 at 21:00, Christian Lyra wrote: > > Hi all > > > > =09Many thanks for the help. Now I got a new shinning 4 machine debia= n > > cluster > > > > :-). But there are a few small problems to fix. One of them is that a= fter > > : I > > > > boot one node, the next node that boots cannot get the kernel image b= y > > tfptp until I restart the inetd daemon! I mean, I boot the master nod= e, > > them node2, them to boot node3 I have to restart inetd, and to boot n= ode4 > > I have to restart inetd again! > > > > > > > > -- > > Christian Lyra > > POP-PR - RNP > > > > A program should be light and agile, its subroutines connected like= a > > string of pearls. The spirit and intent of the program should be reta= ined > > throughout. There should be neither too little or too much, neither > > needless loops nor useless variables, neither lack of structure nor > > overwhelming rigidity. > > =09=09=09=09=09=09The Tao Of Programing > > > > > > ------------------------------------------------------- > > This SF.NET email is sponsored by: > > SourceForge Enterprise Edition + IBM + LinuxWorld =3D Something 2 See= ! > > http://www.vasoftware.com > > _______________________________________________ > > ssic-linux-devel mailing list > > ssi...@li... > > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld =3D Something 2 See! > http://www.vasoftware.com > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel --=20 Christian Lyra POP-PR - RNP Grand Master Turing once dreamed that he was a machine. When he awoke h= e=20 exclaimed:=20 ``I don't know whether I am Turing dreaming that I am a machine, or a= =20 machine dreaming that I am Turing!'' =09=09=09=09=09=09The Tao Of Programing |
From: John B. <joh...@hp...> - 2003-01-24 20:47:23
|
Christian Lyra wrote: > Hi, > > what is a "toss with SIGCLUSTER"? I tried to transfer the kernel image with > a tftp client and I can only get the image one time after the inetd is > started. every other try fail until inetd is restarted. This doesnt happens > with a vannila kernel. > Everytime a node joins (actually changes its membership state), SIGCLUSTER (stolen from one of the realtime signals) gets sent to all the processes on the system. (This was a holdover from the UnixWare implementation.) Aneesh suspects your inetd is dying because of this. If you strace your inetd and it is dying because it is receiving a signal (49 I believe) then Aneesh is correct. This is to be changed in the near future using the linux fasync mechanism. Applications concerned about changes in the cluster membership will open a file descriptor in /proc and use the F_SETOWN and F_SETSIG fcntls to specify that they receive signals and what signal it will be. John Byrne |
From: Christian L. <ly...@po...> - 2003-01-24 20:59:16
|
HI, > Everytime a node joins (actually changes its membership state), > SIGCLUSTER (stolen from one of the realtime signals) gets sent to all > the processes on the system. (This was a holdover from the UnixWare > implementation.) Aneesh suspects your inetd is dying because of this. I= f > you strace your inetd and it is dying because it is receiving a signal > (49 I believe) then Aneesh is correct. > =09Thanks for the explanation, but I think the problem is not with SIGCLU= STER,=20 because I can reproduce the problem using just the master node, and anoth= er=20 (non-cluster) machine with a tftp client (of course, all other nodes are=20 turned offf)! So there's no SIGCLUSTER signals. Ftpd dies(?) after the fi= rst=20 transfer. I try to setup a telnetd just to see if it die too but this not= =20 happens! just tftpd die :-( > This is to be changed in the near future using the linux fasync > mechanism. Applications concerned about changes in the cluster > membership will open a file descriptor in /proc and use the F_SETOWN an= d > F_SETSIG fcntls to specify that they receive signals and what signal it > will be. > > John Byrne --=20 Christian Lyra POP-PR - RNP The master programmer moves from program to program without fear. No ch= ange=20 in management can harm him. He will not be fired, even if the project is=20 cancelled. Why is this? He is filled with Tao.=20 =09=09=09=09=09=09The Tao Of Programing |
From: John B. <joh...@hp...> - 2003-01-24 21:11:41
|
Christian Lyra wrote: > HI, > >>Everytime a node joins (actually changes its membership state), >>SIGCLUSTER (stolen from one of the realtime signals) gets sent to all >>the processes on the system. (This was a holdover from the UnixWare >>implementation.) Aneesh suspects your inetd is dying because of this. If >>you strace your inetd and it is dying because it is receiving a signal >>(49 I believe) then Aneesh is correct. >> > > > Thanks for the explanation, but I think the problem is not with SIGCLUSTER, > because I can reproduce the problem using just the master node, and another > (non-cluster) machine with a tftp client (of course, all other nodes are > turned offf)! So there's no SIGCLUSTER signals. Ftpd dies(?) after the first > transfer. I try to setup a telnetd just to see if it die too but this not > happens! just tftpd die :-( The ftp/tftp transfer completes successfully and then the daemon dies? Or does the daemon die partway through thr transfer. The daemon dying after a successful transfer may not be bad. Most inetd sub-daemons do if they don't receive another request after a given time. If inetd is running and the tftpd is dead, then a new tftpd should be spawned. The problem then might be inetd is not seeing its children die. If you run strace on inetd during the successful operation followed by the failure maybe we could figure out what is going wrong. John Byrne |
From: Christian L. <ly...@po...> - 2003-01-24 21:35:35
|
Last try today... c3sl2:/boot# /usr/sbin/inetd -d /etc/inetd.conf=20 ADD : smtp proto=3Dtcp, wait.max=3D0.40, user.group=3Dmail.(null) builtin= =3D0=20 server=3D/usr/sbin/exim ADD : tftp proto=3Dudp, wait.max=3D1.40, user.group=3Dnobody.(null) built= in=3D0=20 server=3D/usr/sbin/tcpd someone wants tftp 73103 execl /usr/sbin/tcpd 73103 reaped, status 0 <--- transfer ok them inetd never saw other connection (but of course there are other trie= s!).=0D c3sl2:/boot# strace /usr/sbin/inetd -d /etc/inetd.conf [...] rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) =3D 0 select(6, [4 5], NULL, NULL, NULL) =3D 1 (in [5]) write(2, "someone wants tftp\n", 19someone wants tftp ) =3D 19 rt_sigprocmask(SIG_BLOCK, [HUP ALRM CHLD], NULL, 8) =3D 0 gettimeofday({1043443741, 445557}, NULL) =3D 0 fork() =3D 73122 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) =3D 0 select(6, [4], NULL, NULL, NULL73122 execl /usr/sbin/tcpd ) =3D ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) --- wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) =3D=3D 0], WNOHANG, NULL) =3D 7= 3122 write(2, "73122 reaped, status 0\n", 2373122 reaped, status 0 ) =3D 23 wait4(-1, 0xbffff870, WNOHANG, NULL) =3D -1 ECHILD (No child processes= ) sigreturn() =3D ? (mask now []) select(6, [4], NULL, NULL, NULL <unfinished ...> <---crtl+c tcpdump -i eth0 shows that are others tftp requests but inetd couldnt see= =20 them =09Any clues? On Friday 24 January 2003 19:11, you wrote: > Christian Lyra wrote: > > HI, > > > >>Everytime a node joins (actually changes its membership state), > >>SIGCLUSTER (stolen from one of the realtime signals) gets sent to all > >>the processes on the system. (This was a holdover from the UnixWare > >>implementation.) Aneesh suspects your inetd is dying because of this.= If > >>you strace your inetd and it is dying because it is receiving a signa= l > >>(49 I believe) then Aneesh is correct. > > > > =09Thanks for the explanation, but I think the problem is not with > > SIGCLUSTER, because I can reproduce the problem using just the master > > node, and another (non-cluster) machine with a tftp client (of course= , > > all other nodes are turned offf)! So there's no SIGCLUSTER signals. F= tpd > > dies(?) after the first transfer. I try to setup a telnetd just to se= e if > > it die too but this not happens! just tftpd die :-( > > The ftp/tftp transfer completes successfully and then the daemon dies? > Or does the daemon die partway through thr transfer. > > The daemon dying after a successful transfer may not be bad. Most inetd > sub-daemons do if they don't receive another request after a given time= =2E > > If inetd is running and the tftpd is dead, then a new tftpd should be > spawned. The problem then might be inetd is not seeing its children die= =2E > If you run strace on inetd during the successful operation followed b= y > the failure maybe we could figure out what is going wrong. > > John Byrne > --=20 Christian Lyra POP-PR - RNP Who can tell the secrets of their hearts and minds?=20 The answer exists only in Tao.=20 =09=09=09=09=09=09The Tao Of Programing |
From: Christian L. <ly...@po...> - 2003-01-24 21:23:10
|
> The ftp/tftp transfer completes successfully and then the daemon dies? > Or does the daemon die partway through thr transfer. > > The daemon dying after a successful transfer may not be bad. Most inetd > sub-daemons do if they don't receive another request after a given time= =2E =09Sorry... maybe I didnt explain myself.... Inetd never dies, the first=20 transfer works, and there isnt any tftp hanged after the transfer =2E but the next transfer doenst happen until inetd is restarted... look: chuan:/tmp# tftp 10.0.1.1 tftp> get vmlinuz.nb Received 2791134 bytes in 4.9 seconds tftp> get vmlinuz.nb Transfer timed out. tftp>=20 them I restarted inetd while the get is running tftp> get vmlinuz.nb Received 2791134 bytes in 7.2 seconds tftp>=20 > > If inetd is running and the tftpd is dead, then a new tftpd should be > spawned. The problem then might be inetd is not seeing its children die= =2E > If you run strace on inetd during the successful operation followed b= y > the failure maybe we could figure out what is going wrong. =09I will try this, may I ask your help again? I'm not used to run strac= e. May=20 I just strace inetd at the prompt? --=20 Christian Lyra POP-PR - RNP The programmers of old were mysterious and profound. We cannot fathom t= heir=20 thoughts, so all we do is describe their appearance.=20 Aware, like a fox crossing the water. Alert, like a general on the=20 battlefield. Kind, like a hostess greeting her guests. Simple, like uncar= ved=20 blocks of wood. Opaque, like black pools in darkened caves.=20 =09=09=09=09=09=09The Tao Of Programing |
From: John B. <joh...@hp...> - 2003-01-24 21:28:32
|
Christian Lyra wrote: >>The ftp/tftp transfer completes successfully and then the daemon dies? >>Or does the daemon die partway through thr transfer. >> >>The daemon dying after a successful transfer may not be bad. Most inetd >>sub-daemons do if they don't receive another request after a given time. > > > Sorry... maybe I didnt explain myself.... Inetd never dies, the first > transfer works, and there isnt any tftp hanged after the transfer > . but the next transfer doenst happen until inetd is restarted... look: > > chuan:/tmp# tftp 10.0.1.1 > tftp> get vmlinuz.nb > Received 2791134 bytes in 4.9 seconds > tftp> get vmlinuz.nb > Transfer timed out. > > tftp> > > them I restarted inetd while the get is running > > tftp> get vmlinuz.nb > Received 2791134 bytes in 7.2 seconds > tftp> > > >>If inetd is running and the tftpd is dead, then a new tftpd should be >>spawned. The problem then might be inetd is not seeing its children die. >> If you run strace on inetd during the successful operation followed by >>the failure maybe we could figure out what is going wrong. > > > I will try this, may I ask your help again? I'm not used to run strace. May > I just strace inetd at the prompt? > One inetd is up and running do "strace -o strace.out -p <inetd pid>" and do your tftpd. If you add the switches "-fF" it will follow and trace any children as well. John Byrne |
From: John B. <joh...@hp...> - 2003-01-24 21:59:37
|
Christian Lyra wrote: >>One inetd is up and running do "strace -o strace.out -p <inetd pid>" and >>do your tftpd. If you add the switches "-fF" it will follow and trace >>any children as well. >> >>John Byrne You might try putting a wrapper shell script around tftpd that does the strace then. Or perhaps tcpd. I'm not sure this will work, but it might get me the information on what is going wrong. John Byrne |