Thread: [SSI-devel] SUSE: onnode problem
Brought to you by:
brucewalker,
rogertsang
From: Bharata B R. <bha...@hp...> - 2004-11-30 08:31:23
|
On a 2 node SuSE cluster, I am facing a problem with onnode. When I run any command on node 2 from node 1 using onnode, onnode fails due to failure of rexec syscall (strace output given at the end) # onnode 2 /bin/ls can't execute /bin/ls, errno=116. (ESTALE) # onnode 2 ls ls not found. Cluster is running a pre-build openssi FC2 kernel with openssi-tools and cluster-tools rpms installed (no other openssi rpms present) Both the nodes have been transitioned into UP state manually by clusternode_setstate command. Any idea what might be happenning here ? Regards, Bharata. cluster -V looks like this: Node 1: State: UP Previous state: COMINGUP Reason for last transition: API Last transition ID: 2 Last transition time: Sat Nov 27 01:02:14.173936 2004 First transition ID: 1 First transition time: Fri Nov 26 07:48:18.150000 2004 Number of CPUs: 4 Number of CPUs online: 4 Node 2: State: UP Previous state: COMINGUP Reason for last transition: API Last transition ID: 4 Last transition time: Sat Nov 27 01:09:54.543936 2004 First transition ID: 3 First transition time: Sat Nov 27 01:07:23.403936 2004 Number of CPUs: 1 Number of CPUs online: 1 /proc/mounts is like this: rootfs / rootfs rw 0 0 /dev/root /initrd ext2 rw 0 0 /dev/root / cfs rw 0 0 none /cluster/node1/dev cfs rw 0 0 none /dev cfs rw 0 0 proc /proc proc rw 0 0 devpts /cluster/dev/pts devpts rw 0 0 /dev/cciss/c0d0p5 /boot ext3 rw 0 0 /dev/cciss/c0d0p9 /home ext3 rw 0 0 /dev/cciss/c0d0p8 /usr ext3 rw 0 0 tmpfs /dev/shm tmpfs rw 0 0 usbdevfs /proc/bus/usb usbdevfs rw 0 0 00000037 /cluster/node2/dev cfs rw 0 0 00000037 /dev cfs rw 0 0 strace of 'onnode 2 /bin/ls' execve("/bin/onnode", ["onnode", "2", "/bin/ls"], [/* 54 vars */]) = 0 uname({sys="Linux", node="pushya4", ...}) = 0 brk(0) = 0x863b000 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3ff07000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("//lib/tls/i686/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls/i686/mmx", 0xbfe26910) = -1 ENOENT (No such file or directory) open("//lib/tls/i686/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls/i686", 0xbfe26910) = -1 ENOENT (No such file or directory) open("//lib/tls/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls/mmx", 0xbfe26910) = -1 ENOENT (No such file or directory) open("//lib/tls/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls", 0xbfe26910) = -1 ENOENT (No such file or directory) open("//lib/i686/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/i686/mmx", 0xbfe26910) = -1 ENOENT (No such file or directory) open("//lib/i686/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/i686", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 open("//lib/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/mmx", 0xbfe26910) = -1 ENOENT (No such file or directory) open("//lib/libcluster.so.0", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\\\26\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=16940, ...}) = 0 old_mmap(NULL, 19876, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x5cb000 old_mmap(0x5cf000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x3000) = 0x5cf000 close(3) = 0 open("//lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320]\1"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1461208, ...}) = 0 old_mmap(NULL, 1256644, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xb21000 old_mmap(0xc4d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12c000) = 0xc4d000 old_mmap(0xc52000, 7364, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xc52000 close(3) = 0 SYS_290(0xbfe27270, 0x14, 0x804a66c, 0x5cc724, 0x5cfcb0) = 125 SYS_290(0xbfe27200, 0x14, 0x3ff07c60, 0xb2910c, 0x5cfcb0) = 0 SYS_292(0xbff08796, 0xbfe2731c, 0xbfe27324, 0x2, 0xe87290) = -1 ESTALE (Stale NFS file handle) write(2, " can\'t execute /bin/ls, errno=11"..., 35 can't execute /bin/ls, errno=116. ) = 35 exit_group(-1) = ? |
From: Brian J. W. <Bri...@hp...> - 2004-11-30 23:15:55
|
Bharata B Rao wrote: > On a 2 node SuSE cluster, I am facing a problem with onnode. > > When I run any command on node 2 from node 1 using onnode, onnode fails > due to failure of rexec syscall (strace output given at the end) > > # onnode 2 /bin/ls > can't execute /bin/ls, errno=116. (ESTALE) > > # onnode 2 ls > ls not found. > > Cluster is running a pre-build openssi FC2 kernel with openssi-tools and > cluster-tools rpms installed (no other openssi rpms present) > > Both the nodes have been transitioned into UP state manually by > clusternode_setstate command. > > Any idea what might be happenning here ? Are there any interesting console messages on node 2? How about error messages in /cluster/node2/var/log/messages? The failure's probably happening on node 2, and there should be some error message to indicate where in the kernel code it's failing. > SYS_290(0xbfe27270, 0x14, 0x804a66c, 0x5cc724, 0x5cfcb0) = 125 > SYS_290(0xbfe27200, 0x14, 0x3ff07c60, 0xb2910c, 0x5cfcb0) = 0 > SYS_292(0xbff08796, 0xbfe2731c, 0xbfe27324, 0x2, 0xe87290) = -1 ESTALE > (Stale NFS file handle) If you install the SSI-enhanced version of strace, these system calls will be more descriptive. The SYS_292() is the rexecve() call that's failing. Brian |
From: Bharata B R. <bha...@hp...> - 2004-12-01 13:30:15
|
On Wed, 2004-12-01 at 04:45, Brian J. Watson wrote: > Bharata B Rao wrote: > > On a 2 node SuSE cluster, I am facing a problem with onnode. > > > > When I run any command on node 2 from node 1 using onnode, onnode fails > > due to failure of rexec syscall (strace output given at the end) > > > > # onnode 2 /bin/ls > > can't execute /bin/ls, errno=116. (ESTALE) > > > > # onnode 2 ls > > ls not found. > > > > Cluster is running a pre-build openssi FC2 kernel with openssi-tools and > > cluster-tools rpms installed (no other openssi rpms present) > > > > Both the nodes have been transitioned into UP state manually by > > clusternode_setstate command. > > > > Any idea what might be happenning here ? > > Are there any interesting console messages on node 2? How about error > messages in /cluster/node2/var/log/messages? The failure's probably > happening on node 2, and there should be some error message to indicate > where in the kernel code it's failing. This is the message observed on node 2's console: reop_import_path: no such path: /cluster/dev/pts/2 However I do have this. # ls -l /cluster/dev/pts/2 crw--w---- 1 root tty 136, 2 Dec 2 08:23 /cluster/dev/pts/2 And /cluster/node2/var/log/messages is not yet created, since syslogd has not yet run on node2, I assume. > > > SYS_290(0xbfe27270, 0x14, 0x804a66c, 0x5cc724, 0x5cfcb0) = 125 > > SYS_290(0xbfe27200, 0x14, 0x3ff07c60, 0xb2910c, 0x5cfcb0) = 0 > > SYS_292(0xbff08796, 0xbfe2731c, 0xbfe27324, 0x2, 0xe87290) = -1 ESTALE > > (Stale NFS file handle) > > If you install the SSI-enhanced version of strace, these system calls > will be more descriptive. The SYS_292() is the rexecve() call that's > failing. Here's the strace o/p with openssi strace: (onnode 2 /bin/ls) execve("/bin/onnode", ["onnode", "2", "/bin/ls"], [/* 54 vars */]) = 0 uname({sys="Linux", node="pushya4", ...}) = 0 brk(0) = 0xa028000 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3ff95000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("//lib/tls/i686/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls/i686/mmx", 0xbfee02c0) = -1 ENOENT (No such file or directory) open("//lib/tls/i686/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls/i686", 0xbfee02c0) = -1 ENOENT (No such file or directory) open("//lib/tls/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls/mmx", 0xbfee02c0) = -1 ENOENT (No such file or directory) open("//lib/tls/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/tls", 0xbfee02c0) = -1 ENOENT (No such file or directory) open("//lib/i686/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/i686/mmx", 0xbfee02c0) = -1 ENOENT (No such file or directory) open("//lib/i686/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/i686", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 open("//lib/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or directory) stat64("//lib/mmx", 0xbfee02c0) = -1 ENOENT (No such file or directory) open("//lib/libcluster.so.0", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\\\26\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=16940, ...}) = 0 old_mmap(NULL, 19876, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x883000 old_mmap(0x887000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x3000) = 0x887000 close(3) = 0 open("//lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320]\1"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1461208, ...}) = 0 old_mmap(NULL, 1256644, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x408000 old_mmap(0x534000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12c000) = 0x534000 old_mmap(0x539000, 7364, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x539000 close(3) = 0 ssisys({id=SSISYS_CLUSTER_MAXNODES:4}, 20) = 125 ssisys({id=SSISYS_SET_NODE_CONTEXT:4, node=2}, 20) = 0 rexecve("/bin/ls", ["/bin/ls"], [/* 54 vars */], 2) = -1 ESTALE (Stale NFS file handle) write(2, " can\'t execute /bin/ls, errno=11"..., 35 can't execute /bin/ls, errno=116. ) = 35 exit_group(-1) = ? > > Brian > |
From: John B. <joh...@hp...> - 2004-12-01 18:22:30
|
Bharata B Rao wrote: > On Wed, 2004-12-01 at 04:45, Brian J. Watson wrote: > >>Bharata B Rao wrote: >> >>>On a 2 node SuSE cluster, I am facing a problem with onnode. >>> >>>When I run any command on node 2 from node 1 using onnode, onnode fails >>>due to failure of rexec syscall (strace output given at the end) >>> >>># onnode 2 /bin/ls >>>can't execute /bin/ls, errno=116. (ESTALE) >>> >>># onnode 2 ls >>>ls not found. >>> >>>Cluster is running a pre-build openssi FC2 kernel with openssi-tools and >>>cluster-tools rpms installed (no other openssi rpms present) >>> >>>Both the nodes have been transitioned into UP state manually by >>>clusternode_setstate command. >>> >>>Any idea what might be happenning here ? >> >>Are there any interesting console messages on node 2? How about error >>messages in /cluster/node2/var/log/messages? The failure's probably >>happening on node 2, and there should be some error message to indicate >>where in the kernel code it's failing. > > > This is the message observed on node 2's console: > > reop_import_path: no such path: /cluster/dev/pts/2 > > However I do have this. > > # ls -l /cluster/dev/pts/2 > crw--w---- 1 root tty 136, 2 Dec 2 08:23 > /cluster/dev/pts/2 > > And /cluster/node2/var/log/messages is not yet created, since syslogd > has not yet run on node2, I assume. The ESTALE is probably due to the "reop_import_path" error message. If you log into the console and try "onnode 2 /bin/ls" does it work any better? If it does, then something about the distributed /If you run the base kernel, does the /cluster/dev/pts directory exist? Do the /cluster/node1/dev and the /cluster/node2/dev directories exist? Is devfsd running on each node? John > > >>>SYS_290(0xbfe27270, 0x14, 0x804a66c, 0x5cc724, 0x5cfcb0) = 125 >>>SYS_290(0xbfe27200, 0x14, 0x3ff07c60, 0xb2910c, 0x5cfcb0) = 0 >>>SYS_292(0xbff08796, 0xbfe2731c, 0xbfe27324, 0x2, 0xe87290) = -1 ESTALE >>>(Stale NFS file handle) >> >>If you install the SSI-enhanced version of strace, these system calls >>will be more descriptive. The SYS_292() is the rexecve() call that's >>failing. > > > Here's the strace o/p with openssi strace: (onnode 2 /bin/ls) > > execve("/bin/onnode", ["onnode", "2", "/bin/ls"], [/* 54 vars */]) = 0 > uname({sys="Linux", node="pushya4", ...}) = 0 > brk(0) = 0xa028000 > old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0x3ff95000 > open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or > directory) > open("//lib/tls/i686/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No > such file or directory) > stat64("//lib/tls/i686/mmx", 0xbfee02c0) = -1 ENOENT (No such file or > directory) > open("//lib/tls/i686/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such > file or directory) > stat64("//lib/tls/i686", 0xbfee02c0) = -1 ENOENT (No such file or > directory) > open("//lib/tls/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such > file or directory) > stat64("//lib/tls/mmx", 0xbfee02c0) = -1 ENOENT (No such file or > directory) > open("//lib/tls/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or > directory) > stat64("//lib/tls", 0xbfee02c0) = -1 ENOENT (No such file or > directory) > open("//lib/i686/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such > file or directory) > stat64("//lib/i686/mmx", 0xbfee02c0) = -1 ENOENT (No such file or > directory) > open("//lib/i686/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file > or directory) > stat64("//lib/i686", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 > open("//lib/mmx/libcluster.so.0", O_RDONLY) = -1 ENOENT (No such file or > directory) > stat64("//lib/mmx", 0xbfee02c0) = -1 ENOENT (No such file or > directory) > open("//lib/libcluster.so.0", O_RDONLY) = 3 > read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\\\26\0"..., > 512) = 512 > fstat64(3, {st_mode=S_IFREG|0755, st_size=16940, ...}) = 0 > old_mmap(NULL, 19876, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x883000 > old_mmap(0x887000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, > 0x3000) = 0x887000 > close(3) = 0 > open("//lib/i686/libc.so.6", O_RDONLY) = 3 > read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320]\1"..., > 512) = 512 > fstat64(3, {st_mode=S_IFREG|0755, st_size=1461208, ...}) = 0 > old_mmap(NULL, 1256644, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = > 0x408000 > old_mmap(0x534000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, > 3, 0x12c000) = 0x534000 > old_mmap(0x539000, 7364, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x539000 > close(3) = 0 > ssisys({id=SSISYS_CLUSTER_MAXNODES:4}, 20) = 125 > ssisys({id=SSISYS_SET_NODE_CONTEXT:4, node=2}, 20) = 0 > rexecve("/bin/ls", ["/bin/ls"], [/* 54 vars */], 2) = -1 ESTALE (Stale > NFS file handle) > write(2, " can\'t execute /bin/ls, errno=11"..., 35 can't execute > /bin/ls, errno=116. > ) = 35 > exit_group(-1) = ? > > >>Brian >> > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |
From: Bharata B R. <bha...@hp...> - 2004-12-03 10:06:25
|
On Wed, 2004-12-01 at 23:52, John Byrne wrote: > Bharata B Rao wrote: > > > > This is the message observed on node 2's console: > > > > reop_import_path: no such path: /cluster/dev/pts/2 > > > > However I do have this. > > > > # ls -l /cluster/dev/pts/2 > > crw--w---- 1 root tty 136, 2 Dec 2 08:23 > > /cluster/dev/pts/2 > > > > And /cluster/node2/var/log/messages is not yet created, since syslogd > > has not yet run on node2, I assume. > > The ESTALE is probably due to the "reop_import_path" error message. If > you log into the console and try "onnode 2 /bin/ls" does it work any > better? If it does, then something about the distributed /If you run the > base kernel, does the /cluster/dev/pts directory exist? Do the > /cluster/node1/dev and the /cluster/node2/dev directories exist? Is > devfsd running on each node? In the present setup, I am running the 'onnode 2 /bin/ls' after login into node 1's console. I still don't have a console on node 2. With the base kernel, /cluster/dev/pts, /cluster/node1/dev and /cluster/node2/dev directories do exist and they are empty. With ssi kernel, I don't understand why the "reop_import_path" should complain about non existence of /cluster/dev/pts (sometimes /cluster/dev/pts/0 and /cluster/dev/pts/1) devfsd is running only on node 1. On node2 only init has run and no services have started. (running unmodified init still) # ls -l /cluster/node1/dev/pts lr-xr-xr-x 1 root root 16 Jan 1 1970 /cluster/node1/dev/pts -> /cluster/dev/pts # ls -l /cluster/node2/dev/pts lr-xr-xr-x 1 root root 16 Jan 1 1970 /cluster/node2/dev/pts -> /cluster/dev/pts # ls -l /dev/pts lr-xr-xr-x 1 root root 16 Jan 1 1970 /dev/pts -> /cluster/dev/pts # ls -l /cluster/dev/pts/ total 4 drwxr-xr-x 2 root root 0 Dec 4 04:48 . drwxr-xr-x 3 root root 4096 Dec 3 08:59 .. crw--w---- 1 root tty 136, 0 Dec 4 05:00 0 crw--w---- 1 bharata tty 136, 1 Dec 4 04:49 1 Regards, Bharata. |
From: John B. <joh...@hp...> - 2004-12-03 19:40:53
|
Bharata B Rao wrote: > On Wed, 2004-12-01 at 23:52, John Byrne wrote: > >>Bharata B Rao wrote: >> >>>This is the message observed on node 2's console: >>> >>>reop_import_path: no such path: /cluster/dev/pts/2 >>> >>>However I do have this. >>> >>># ls -l /cluster/dev/pts/2 >>>crw--w---- 1 root tty 136, 2 Dec 2 08:23 >>>/cluster/dev/pts/2 >>> >>>And /cluster/node2/var/log/messages is not yet created, since syslogd >>>has not yet run on node2, I assume. >> >>The ESTALE is probably due to the "reop_import_path" error message. If >>you log into the console and try "onnode 2 /bin/ls" does it work any >>better? If it does, then something about the distributed /If you run the >>base kernel, does the /cluster/dev/pts directory exist? Do the >>/cluster/node1/dev and the /cluster/node2/dev directories exist? Is >>devfsd running on each node? > > > In the present setup, I am running the 'onnode 2 /bin/ls' after login > into node 1's console. I still don't have a console on node 2. > > With the base kernel, /cluster/dev/pts, /cluster/node1/dev and > /cluster/node2/dev directories do exist and they are empty. > > With ssi kernel, I don't understand why the "reop_import_path" should > complain about non existence of /cluster/dev/pts (sometimes > /cluster/dev/pts/0 and /cluster/dev/pts/1) > > devfsd is running only on node 1. On node2 only init has run and no > services have started. (running unmodified init still) devfsd has to run on every node. Try this and see if it fixes the problem. > > # ls -l /cluster/node1/dev/pts > lr-xr-xr-x 1 root root 16 Jan 1 1970 > /cluster/node1/dev/pts -> /cluster/dev/pts > > # ls -l /cluster/node2/dev/pts > lr-xr-xr-x 1 root root 16 Jan 1 1970 > /cluster/node2/dev/pts -> /cluster/dev/pts > > # ls -l /dev/pts > lr-xr-xr-x 1 root root 16 Jan 1 1970 /dev/pts -> > /cluster/dev/pts > > # ls -l /cluster/dev/pts/ > total 4 > drwxr-xr-x 2 root root 0 Dec 4 04:48 . > drwxr-xr-x 3 root root 4096 Dec 3 08:59 .. > crw--w---- 1 root tty 136, 0 Dec 4 05:00 0 > crw--w---- 1 bharata tty 136, 1 Dec 4 04:49 1 > > Regards, > Bharata. > > > > |
From: Bharata B R. <bha...@hp...> - 2004-12-07 14:29:13
|
On Sat, 2004-12-04 at 01:10, John Byrne wrote: > Bharata B Rao wrote: > > > > devfsd is running only on node 1. On node2 only init has run and no > > services have started. (running unmodified init still) > > devfsd has to run on every node. Try this and see if it fixes the problem. > First, since node 2 is running an unmodified init and initscripts has not been modified yet, no services get started on 2nd node. Hence when I tried starting devfsd from linuxrc (of initrd) I ran into other problems: - When the kernel boots on the 2nd node, it doesn't mount devfs by default. It needs a command line argument (devfs=mount) which I am now supplying thro' /tftpboot/pxelinux.cfg/default file. (But for the kernel on initnode, without devfs=mount option, devfs gets mounted, why ?) - With this extra option, the kernel on 2nd node started creating a .devfsd entry in it's /dev # ls -l /dev/.devfsd crw------- 1 root root 8,0 Dec 8 04:04 /dev/.devfsd According to devfs FAQ, the presence of .devfsd indicates that devfs has been mounted. So I conclude that devfs is getting mounted in the 2nd node. - But when trying to start the devfsd daemon, this is the message I observe: # /sbin/devfsd /dev modprobe: modprobe: Can't locate module char-major-8 Error opening file: ".devfsd" No such device Not sure why there is a search for char-major-8 here given that devfs is in the kernel And not sure why .devfsd can't be found. Any hints why might be wrong here ? Regards, Bharata. |
From: John B. <joh...@hp...> - 2004-12-07 19:55:33
|
Bharata B Rao wrote: > On Sat, 2004-12-04 at 01:10, John Byrne wrote: > >>Bharata B Rao wrote: >> >>>devfsd is running only on node 1. On node2 only init has run and no >>>services have started. (running unmodified init still) >> >>devfsd has to run on every node. Try this and see if it fixes the problem. >> > > > First, since node 2 is running an unmodified init and initscripts has > not been modified yet, no services get started on 2nd node. > > Hence when I tried starting devfsd from linuxrc (of initrd) I ran into > other problems: > > - When the kernel boots on the 2nd node, it doesn't mount devfs by > default. It needs a command line argument (devfs=mount) which I am now > supplying thro' /tftpboot/pxelinux.cfg/default file. > (But for the kernel on initnode, without devfs=mount option, devfs gets > mounted, why ?) > > - With this extra option, the kernel on 2nd node started creating a > .devfsd entry in it's /dev > > # ls -l /dev/.devfsd > crw------- 1 root root 8,0 Dec 8 04:04 /dev/.devfsd > > According to devfs FAQ, the presence of .devfsd indicates that devfs has > been mounted. So I conclude that devfs is getting mounted in the 2nd > node. > > - But when trying to start the devfsd daemon, this is the message I > observe: > > # /sbin/devfsd /dev > modprobe: modprobe: Can't locate module char-major-8 > Error opening file: ".devfsd" No such device > > Not sure why there is a search for char-major-8 here given that devfs is > in the kernel > > And not sure why .devfsd can't be found. > > Any hints why might be wrong here ? Looking into why you need the devfs=mount option suggests that you don't have your initrd right. Devfs is supposed to get automatically mounted in the kernel by ssi_mount_devfs() called from initproc_postroot_init() which will be called by executing "cluster_config --initproc" from the initrd. I haven't been following what Bruce has told you to do with the initrd, but maybe you need to add this. John > > Regards, > Bharata. > > |
From: Bharata B R. <bha...@hp...> - 2004-12-08 13:29:20
|
On Wed, 2004-12-08 at 01:25, John Byrne wrote: > Bharata B Rao wrote: > > On Sat, 2004-12-04 at 01:10, John Byrne wrote: > > > >>Bharata B Rao wrote: > >> > >>>devfsd is running only on node 1. On node2 only init has run and no > >>>services have started. (running unmodified init still) > >> > >>devfsd has to run on every node. Try this and see if it fixes the problem. > >> > > > > > > First, since node 2 is running an unmodified init and initscripts has > > not been modified yet, no services get started on 2nd node. > > > > Hence when I tried starting devfsd from linuxrc (of initrd) I ran into > > other problems: > > > > - When the kernel boots on the 2nd node, it doesn't mount devfs by > > default. It needs a command line argument (devfs=mount) which I am now > > supplying thro' /tftpboot/pxelinux.cfg/default file. > > (But for the kernel on initnode, without devfs=mount option, devfs gets > > mounted, why ?) > > > > - With this extra option, the kernel on 2nd node started creating a > > .devfsd entry in it's /dev > > > > # ls -l /dev/.devfsd > > crw------- 1 root root 8,0 Dec 8 04:04 /dev/.devfsd > > > > According to devfs FAQ, the presence of .devfsd indicates that devfs has > > been mounted. So I conclude that devfs is getting mounted in the 2nd > > node. > > > > - But when trying to start the devfsd daemon, this is the message I > > observe: > > > > # /sbin/devfsd /dev > > modprobe: modprobe: Can't locate module char-major-8 > > Error opening file: ".devfsd" No such device > > > > Not sure why there is a search for char-major-8 here given that devfs is > > in the kernel > > > > And not sure why .devfsd can't be found. > > > > Any hints why might be wrong here ? > > Looking into why you need the devfs=mount option suggests that you don't > have your initrd right. Devfs is supposed to get automatically mounted > in the kernel by ssi_mount_devfs() called from initproc_postroot_init() > which will be called by executing "cluster_config --initproc" from the > initrd. Ok, that explains I believe, why I wasn't able to start devfsd. Now it is clear that devfs gets mounted only during initproc option of cluster_config, I was always trying to start devfs before exec'ing the cluster_config --initproc. With the linuxrc version attached below (linuxrc1), things won't proceed much because we didn't have a ssi-modified sysvinit till now. I would report the progress when I try again with ssi-modified sysvinit. I tried to get things moving further with unmodified sysvinit by doing some changes to linuxrc(attached below as linuxrc2). Here everything is done manually including cluster_config --initproc, starting of devfs and mounting of /proc. However until devpts is mounted from node 2, onnode wouldn't work. But manual mounting of devpts from linuxrc would panic the system somewhere in the mount code. Need to investigate this further. Regards, Bharata. > > I haven't been following what Bruce has told you to do with the initrd, > but maybe you need to add this. > > John > > > > > > Regards, > > Bharata. > > > > > |
From: Brian J. W. <Bri...@hp...> - 2004-12-08 22:49:11
|
Bharata B Rao wrote: > Ok, that explains I believe, why I wasn't able to start devfsd. Now it > is clear that devfs gets mounted only during initproc option of > cluster_config, I was always trying to start devfs before exec'ing the > cluster_config --initproc. > > With the linuxrc version attached below (linuxrc1), things won't proceed > much because we didn't have a ssi-modified sysvinit till now. I would > report the progress when I try again with ssi-modified sysvinit. > > I tried to get things moving further with unmodified sysvinit by doing > some changes to linuxrc(attached below as linuxrc2). Here everything is > done manually including cluster_config --initproc, starting of devfs and > mounting of /proc. > > However until devpts is mounted from node 2, onnode wouldn't work. But > manual mounting of devpts from linuxrc would panic the system somewhere > in the mount code. Need to investigate this further. Why aren't you running 'cluster_config --initproc'? It should only try to start /sbin/init on the first node, so it shouldn't be a problem. You don't need an init running on node 2, but running 'cluster_config --initproc' should mount devfs and do anything else you need to successfully onnode commands to node 2 (aside from manually running clusternode_setstate). Brian |
From: John B. <joh...@hp...> - 2004-12-08 23:09:30
|
Bharata B Rao wrote: > On Wed, 2004-12-08 at 01:25, John Byrne wrote: > >>Bharata B Rao wrote: >> >>>On Sat, 2004-12-04 at 01:10, John Byrne wrote: >>> >>> >>>>Bharata B Rao wrote: >>>> >>>> >>>>>devfsd is running only on node 1. On node2 only init has run and no >>>>>services have started. (running unmodified init still) >>>> >>>>devfsd has to run on every node. Try this and see if it fixes the problem. >>>> >>> >>> >>>First, since node 2 is running an unmodified init and initscripts has >>>not been modified yet, no services get started on 2nd node. >>> >>>Hence when I tried starting devfsd from linuxrc (of initrd) I ran into >>>other problems: >>> >>>- When the kernel boots on the 2nd node, it doesn't mount devfs by >>>default. It needs a command line argument (devfs=mount) which I am now >>>supplying thro' /tftpboot/pxelinux.cfg/default file. >>>(But for the kernel on initnode, without devfs=mount option, devfs gets >>>mounted, why ?) >>> >>>- With this extra option, the kernel on 2nd node started creating a >>>.devfsd entry in it's /dev >>> >>># ls -l /dev/.devfsd >>>crw------- 1 root root 8,0 Dec 8 04:04 /dev/.devfsd >>> >>>According to devfs FAQ, the presence of .devfsd indicates that devfs has >>>been mounted. So I conclude that devfs is getting mounted in the 2nd >>>node. >>> >>>- But when trying to start the devfsd daemon, this is the message I >>>observe: >>> >>># /sbin/devfsd /dev >>>modprobe: modprobe: Can't locate module char-major-8 >>>Error opening file: ".devfsd" No such device >>> >>>Not sure why there is a search for char-major-8 here given that devfs is >>>in the kernel >>> >>>And not sure why .devfsd can't be found. >>> >>>Any hints why might be wrong here ? >> >>Looking into why you need the devfs=mount option suggests that you don't >>have your initrd right. Devfs is supposed to get automatically mounted >>in the kernel by ssi_mount_devfs() called from initproc_postroot_init() >>which will be called by executing "cluster_config --initproc" from the >>initrd. > > > Ok, that explains I believe, why I wasn't able to start devfsd. Now it > is clear that devfs gets mounted only during initproc option of > cluster_config, I was always trying to start devfs before exec'ing the > cluster_config --initproc. > > With the linuxrc version attached below (linuxrc1), things won't proceed > much because we didn't have a ssi-modified sysvinit till now. I would > report the progress when I try again with ssi-modified sysvinit. > > I tried to get things moving further with unmodified sysvinit by doing > some changes to linuxrc(attached below as linuxrc2). Here everything is > done manually including cluster_config --initproc, starting of devfs and > mounting of /proc. > > However until devpts is mounted from node 2, onnode wouldn't work. But > manual mounting of devpts from linuxrc would panic the system somewhere > in the mount code. Need to investigate this further. > > Regards, > Bharata. > Your linuxrc2 is a problem. The "cluster_config --initproc" must be execed because it is supposed to turn pid 2 into the reaping process on each node. Not doing this may make things strange. Trying using "clusternode_setstate UP" to mark the node fully up and then start a shell with onnode. John |