From: Daniel W. <wid...@ci...> - 2002-05-29 16:40:00
|
#include <stdio.h> #include <sys/bproc.h> #include <pwd.h> #include <sys/types.h> main() { struct passwd *pwent = getpwuid(0); printf("%s\n",pwent->pw_name); } # bpsh 0 ./a.out bpsh: Child process exited abnormally. Same for getgrgid(). This is the problem to which Nic Henke was referring on 5/10 on this mailing list. Apologies if this bug is already known and listed. The reason we're using this: pam_bproc module for access control. Any ideas? How do I go about debugging bpsh calls in general? strace output on master doesn't seem promising. strace -f gives: [root@alpha pam_bproc]# strace -f -o bpsh.out bpsh 0 ./a.out Process 9229 attached Process 9228 suspended and bpsh.out ends in: [pid 8940] open("/etc/ld.so.cache", O_RDONLY) = 3 [pid 8940] fstat64(3, {st_mode=S_IFREG|0644, st_size=16199, ...}) = 0 [pid 8940] old_mmap(NULL, 16199, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000 [pid 8940] close(3) = 0 [pid 8940] open("/lib/libnss_files.so.2", O_RDONLY) = 3 [pid 8940] read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200\0"..., 1024) = 1024 [pid 8940] fstat64(3, {st_mode=S_IFREG|0755, st_size=261460, ...}) = 0 [pid 8940] old_mmap(NULL, 42408, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4015b000 [pid 8940] mprotect(0x40165000, 1448, PROT_NONE) = 0 [pid 8940] old_mmap(0x40165000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x9000) = 0x40165000 [pid 8940] close(3) = 0 [pid 8940] munmap(0x40018000, 16199) = 0 [pid 8940] --- SIGSEGV (Segmentation fault) --- [pid 8938] <... select resumed> ) = 2 (in [4 5], left {299, 640000}) [pid 8938] rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 [pid 8938] close(4) = 0 [pid 8938] read(5, "", 4096) = 0 [pid 8938] close(5) = 0 [pid 8938] rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0 [pid 8938] select(7, [0 3 6], [], NULL, {300, 0}) = 1 (in [6], left {300, 0}) [pid 8938] rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 [pid 8938] read(6, "", 4096) = 0 [pid 8938] close(6) = 0 [pid 8938] wait4(-1, <unfinished ...> [pid 8940] +++ killed by SIGSEGV +++ Thanks, Dan W. -- -- Daniel Widyono http://www.cis.upenn.edu/~widyono -- Linux Cluster Group, CIS Dept., SEAS, University of Pennsylvania -- Mail: Rm 556, CIS Dept 200 S 33rd St Philadelphia, PA 19104 |
From: Jag <ag...@li...> - 2002-05-29 18:20:38
|
On Wed, 2002-05-29 at 12:39, Daniel Widyono wrote: > #include <stdio.h> > #include <sys/bproc.h> > #include <pwd.h> > #include <sys/types.h> >=20 > main() { > struct passwd *pwent =3D getpwuid(0); > printf("%s\n",pwent->pw_name); > } >=20 >=20 > # bpsh 0 ./a.out > bpsh: Child process exited abnormally. >=20 >=20 > Same for getgrgid().=20 Check if pwent is NULL. Unless you managed to properlly setup all the nss stuff on the slave node, it won't be able to find the info for uid 0, and will thus return NULL, and your printf statement will likely cause a segfault. |
From: Daniel W. <wi...@ci...> - 2002-05-31 13:18:08
|
Good point, I should have been more thorough in my test case. However, main() { struct passwd *pwent = getpwuid(0); printf("pwent = %p\n", (void *)pwent); if (pwent) { printf("pwent->pw_name = \"%s\"\n",pwent->pw_name); } } [root@alpha pam_bproc]# ./a.out pwent = 0x4016008c pwent->pw_name = "root" [root@alpha pam_bproc]# bpsh 0 ./a.out bpsh: Child process exited abnormally. This happens in getpwuid, not in my app. I also confirmed that this runs fine on the node when not run via bpsh. One more check, not using bpsh: main() { int pid; if (pid = bproc_rfork(0)) { wait(pid); } else if (pid == 0) { struct passwd *pwent = getpwuid(0); printf("pwent = %p\n", (void *)pwent); if (pwent) { printf("pwent->pw_name = \"%s\"\n",pwent->pw_name); } } else { perror("could not rfork\n"); exit (-1); } } This works fine. Signs seem to point to bpsh interaction. Any other bproc debugging advice? Thanks, Dan W. On Wed, May 29, 2002 at 02:18:55PM -0400, Jag wrote: > Check if pwent is NULL. Unless you managed to properlly setup all the > nss stuff on the slave node, it won't be able to find the info for uid > 0, and will thus return NULL, and your printf statement will likely > cause a segfault. -- -- Daniel Widyono http://www.cis.upenn.edu/~widyono -- Linux Cluster Group, CIS Dept., SEAS, University of Pennsylvania -- Mail: Rm 556, CIS Dept 200 S 33rd St Philadelphia, PA 19104 |
From: Erik A. H. <er...@he...> - 2002-05-31 19:21:24
|
On Wed, May 29, 2002 at 12:39:53PM -0400, Daniel Widyono wrote: > Any ideas? How do I go about debugging bpsh calls in general? strace output > on master doesn't seem promising. strace -f gives: What does the environment look like? Are you using a Clustermatic or Scyld-like environment? Specifically, I'm wondering if you're using the normal NSS stuff for lookups or using beonss/bproc. beonss is a bit quirky when you ask for user names. As far as strace goes, you can take two routes with bpsh. Both require BProc version 3 or later. strace bpsh -N 0 command This turns off ALL I/O forwarding. This means that you don't get to see output but bpsh *will* exec the command directly so strace will be attached to the right process. Another alternative is to use the undocumented "-S" flag. This stops the child process before doing bproc_execmove. That gives you a window to attach strace to the child before it runs. For example: $ bpsh 0 -S uptime <it hangs here> In another window: $ ps xf | grep uptime 20172 pts/10 S 0:00 grep uptime 20169 pts/9 S 0:00 bpsh 0 -S uptime 20170 pts/9 T 0:00 \_ bpsh 0 -S uptime $ strace -p 20170 --- SIGSTOP (Stopped (signal)) --- SYS_291(0x304, 0, 0x11ffff110, 0x11ffff8d0, 0x11ffff8e0) = 0 SYS_0(0x304, 0, 0x11ffff110, 0, 0x11ffff8e0) = -1 ERRNO_339 (errno 339) SYS_0(0x11ffff2b8, 0x20000005870, 0, 0, 0x2) = 17 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 .... watch uptime run to completion... - Erik -- Erik Arjan Hendriks Printed On 100 Percent Recycled Electrons er...@he... Contents may settle during shipment |
From: Daniel W. <wi...@ci...> - 2002-06-01 11:19:52
|
> What does the environment look like? Are you using a Clustermatic > or Scyld-like environment? Specifically, I'm wondering if you're > using the normal NSS stuff for lookups or using beonss/bproc. Thanks, Erik. We're using the Clubmask environment, plain standard RH7.2 nss stuff. We have two clusters which are operating fine under this environment (24 nodes & 64 nodes), and one which isn't (64 nodes -- and I'm trying to debug via bproc to determine what is different). All Dell 2550 front ends to Dell 1550 nodes. > As far as strace goes, you can take two routes with bpsh. Both > require BProc version 3 or later. [...] > strace bpsh -N 0 command > Another alternative is to use the undocumented "-S" flag. 3.1.9, so I'll try your recommendations out, thanks again! Dan W. -- -- Daniel Widyono http://www.cis.upenn.edu/~widyono -- Linux Cluster Group, CIS Dept., SEAS, University of Pennsylvania -- Mail: Rm 556, CIS Dept 200 S 33rd St Philadelphia, PA 19104 |