Thread: [SSI-devel] Re: Problem with SSI Installation
Brought to you by:
brucewalker,
rogertsang
From: Andreas R. <ro...@co...> - 2004-03-29 13:36:30
|
Aneesh Kumar KV wrote: > Andreas Roos wrote: > >> Hello >> >> Aneesh Kumar KV wrote: >> >>> Andreas Roos wrote: >>> >>>> Hello. >>>> >>>> The problem was that by the installation with install-openssi there >>>> were some dependencies to packages i hadn't installed. Now I was >>>> able to create the ramdisk. But there is a problem, if I boot the >>>> system with the ssi-kernel. (I am using the kernel and modules from >>>> http://www.openssi.org/contrib/debian/) >>>> >>>> RAMDISK: Compressed image found at block 0 >>>> : >>>> : : >>>> Note: unable to open serial console. >>>> Mounting /proc >>>> attempt to access beyond end of device >>>> 01:00: rw=0, want=9099, limit=9000 >>>> : >>>> >>> >>> >>> >>> >>> >>> I think ramdisk size has incresed a lot with all those modules. You >>> can disable lot of modules by editing >>> /etc/mkinitrd-openssi/mkinitrd.conf >>> change the below >>> MODULES=most >>> >>> to >>> MODULES=dep >> >> >> >> with this option the error above don't no longer exist, but the >> Problem is now that the Module for the network card isn't loaded yet, >> when linuxrc is called. ifconfig dont't find the network card and so >> the system cannot boot. I tried to load the module for the network >> card, but I didn't succeed. Is it possible to make the ramdisk large >> enough to hold the modules with the option > > > MODULES=most ? > > > for that you will have to rebuild the kernel. I guess MODULES=most is > enought to make sure that all the modules get loaded. BTW in your > can you send me the content of linuxrc from the ramdisk ( > <tempo_dir>/initrd/linuxrc ) and loadmodules > I am not sure why network module is not getting loaded. Also what you > can try is fix the locale problem is to add > > export LC_ALL="C" below the line call /loadmodule Hello. LC_ALL="C" does work. It switches the output to English. I recompiled the kernel with a limit for the ramdisk of 30000. (Before 9000). After that I was able to boot the system and it works fine. Thank you for your help. But now I have another Problem with the network booting of the second node. I created an network boot image using mkelf-linux --output=/tftpboot/ssi.nb vmlinuz-2.4.20-pre-bigram initrd-ssi.gz the tftp server is startet and is working. On the second node I boot the Etherboot image with lilo. But after getting the image the booting stop Searching for server (DHCP)... Me: 10.19.250.104, Server: 10.19.250.52 Loading 10.19.250.52:/tftpboot/ssi.nb (ELF)... done mknbi-1.2-12/first32.c (ELF) (GPL) Top of ramdisk is 0x08000000 Ramdisk at 0x07cc2000, size 0x0033e000 After that the system hangs. Do you have any idea what's wrong? Andreas > > > -aneesh > > > |
From: Chirag K. <chi...@hp...> - 2004-03-29 13:44:50
|
Andreas Roos <ro...@co...> writes: > Searching for server (DHCP)... > Me: 10.19.250.104, Server: 10.19.250.52 > Loading 10.19.250.52:/tftpboot/ssi.nb (ELF)... done > mknbi-1.2-12/first32.c (ELF) (GPL) > Top of ramdisk is 0x08000000 > Ramdisk at 0x07cc2000, size 0x0033e000 You might want to upgrade the version of mknbi and check again. We use 1.4.3 version on RedHat (1.2-12 seems to be bit old). http://etherboot.sourceforge.net/distribution.html -- Chirag Kantharia, Industry Standard Servers Hewlett-Packard India Software Operations, Bangalore, India. |
From: Andreas R. <ro...@co...> - 2004-03-29 14:15:20
|
Chirag Kantharia wrote: >Andreas Roos <ro...@co...> writes: > > >>Searching for server (DHCP)... >>Me: 10.19.250.104, Server: 10.19.250.52 >>Loading 10.19.250.52:/tftpboot/ssi.nb (ELF)... done >>mknbi-1.2-12/first32.c (ELF) (GPL) >>Top of ramdisk is 0x08000000 >>Ramdisk at 0x07cc2000, size 0x0033e000 >> >> > >You might want to upgrade the version of mknbi and check again. We use >1.4.3 version on RedHat (1.2-12 seems to be bit old). >http://etherboot.sourceforge.net/distribution.html > > Thanks you were right with that. The version was to old. But now I get the error message VFS: Mounted root (ext2 filesystem) Freeing unused... Note: Unable to open serial console Mounting /proc Gathering cluster info ERROR: Could not find the NIC used to add this node to the cluster. Unable to continue. Halting. Do somebody know what the reason for that problem could be? Andreas |
From: Aneesh K. KV <ane...@di...> - 2004-03-29 14:34:47
|
Andreas Roos wrote: > Chirag Kantharia wrote: > >>Andreas Roos <ro...@co...> writes: >> >> >>>Searching for server (DHCP)... >>>Me: 10.19.250.104, Server: 10.19.250.52 >>>Loading 10.19.250.52:/tftpboot/ssi.nb (ELF)... done >>>mknbi-1.2-12/first32.c (ELF) (GPL) >>>Top of ramdisk is 0x08000000 >>>Ramdisk at 0x07cc2000, size 0x0033e000 >>> >>> >> >>You might want to upgrade the version of mknbi and check again. We use >>1.4.3 version on RedHat (1.2-12 seems to be bit old). >>http://etherboot.sourceforge.net/distribution.html >> >> > > Thanks > you were right with that. The version was to old. But now I get the > error message > > VFS: Mounted root (ext2 filesystem) > Freeing unused... > Note: Unable to open serial console > Mounting /proc > Gathering cluster info > ERROR: Could not find the NIC used to add this node to the cluster. > Unable to continue. Halting. > > you don't have the driver for the network card. -aneesh -- ph: 603-884-5742 |
From: Chirag K. <chi...@hp...> - 2004-03-30 05:31:52
|
Aneesh Kumar KV <ane...@di...> writes: >> ERROR: Could not find the NIC used to add this node to the cluster. >> Unable to continue. Halting. <snip> > you don't have the driver for the network card. The linuxrc tries to match the hw address of the NIC found on the system, with an entry in /etc/clustertab. If the module for the NIC included in the ramdisk is wrong, then, ifconfig fails, cos of which, the linuxrc would spit the above error. If you have a cluster with heterogenous NICs, then make sure that you add the kernel modules to the initrd with the --with option. Eg. mkinitrd --with=eepro100 --with=8139too .... By default, the initrd would include all the kernel modules mentioned in the /etc/modules.conf (in form of alias eth0 ...). -- Chirag Kantharia, Industry Standard Servers Hewlett-Packard India Software Operations, Bangalore, India. |
From: Brian J. W. <Bri...@hp...> - 2004-03-30 08:05:31
|
Chirag Kantharia wrote: > By default, the initrd would include all the kernel modules mentioned > in the /etc/modules.conf (in form of alias eth0 ...). The following line also works: alias eth-extra foo You can repeat 'alias eth-extra ...' as many times as you need. BTW, someone should enhance mkinitrd to have an --update option. It has a limited form of this with the --tabonly option. This, however, only updates the initrd's copy of /etc/clustertab. A true --update option would also copy in any new network drivers in /etc/modules.conf, and maybe do the proper update to /linuxrc if chard is enabled/disabled for the root. The might be some other things --update could do, too. Brian |
From: Chirag K. <chi...@hp...> - 2004-03-30 08:36:41
|
On Tue, Mar 30, 2004 at 12:05:08AM -0800, Brian J. Watson wrote: | BTW, someone should enhance mkinitrd to have an --update option. It has= =20 | a limited form of this with the --tabonly option. This, however, only=20 | updates the initrd's copy of /etc/clustertab. A true --update option=20 | would also copy in any new network drivers in /etc/modules.conf, and=20 | maybe do the proper update to /linuxrc if chard is enabled/disabled for= =20 | the root. The might be some other things --update could do, too. <sigh> I'd volunteered to do that, but I've had too much todo items to take care of. If anybody else, feels like taking it up, I can help. --=20 Chirag Kantharia, Industry Standard Servers Hewlett-Packard India Software Operations, Bangalore, India. |
From: Chirag K. <chi...@hp...> - 2004-03-30 09:23:44
|
On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: | >mkinitrd --with=3Deepro100 --with=3D8139too .... | Which version of mkinitrd do you use. It seems that my version 1.161=20 | (2004/02/27) do not have an option --with I'm sorry; I recall now, that you're using Debian. For debian, please update /etc/mkinitrd/modules with names of the relevant modules for the NICs. If you're specifying mkcramfs in /etc/mkinitrd/mkinitrd.conf to build the initrd, then make sure that you have CRAMFS support built into the kernel. | >By default, the initrd would include all the kernel modules mentioned | >in the /etc/modules.conf (in form of alias eth0 ...). | In order to load the additional module 8139too.o what line do I have to= =20 | add to /etc/modules before calling mkinitrd? | alias eth0 8139too ? | or | alias eth-extra 8139too This applies to only RedHat mkinitrd. --=20 Chirag Kantharia, Industry Standard Servers Hewlett-Packard India Software Operations, Bangalore, India. |
From: Aneesh K. KV <ane...@di...> - 2004-03-30 14:14:41
|
Chirag Kantharia wrote: >On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >| >mkinitrd --with=eepro100 --with=8139too .... >| Which version of mkinitrd do you use. It seems that my version 1.161 >| (2004/02/27) do not have an option --with > >I'm sorry; I recall now, that you're using Debian. For debian, please >update /etc/mkinitrd/modules with names of the relevant modules for >the NICs. If you're specifying mkcramfs in /etc/mkinitrd/mkinitrd.conf >to build the initrd, then make sure that you have CRAMFS support built >into the kernel. > > Debian openssi doesn't work that we use our of mkramdisk. so no need to have cramfs support again it is not /etc/mkinitrd it is /etc/mkinitrd-openssi/modules. But then he has to get the latest initrd-tools from CVS. The linuxrc call to load modules is recently added. -aneesh -- ph: 603-884-5742 |
From: Andreas R. <ro...@co...> - 2004-03-30 14:37:07
|
Aneesh Kumar KV wrote: > Chirag Kantharia wrote: > >> On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >> | >mkinitrd --with=eepro100 --with=8139too .... >> | Which version of mkinitrd do you use. It seems that my version >> 1.161 | (2004/02/27) do not have an option --with >> >> I'm sorry; I recall now, that you're using Debian. For debian, please >> update /etc/mkinitrd/modules with names of the relevant modules for >> the NICs. If you're specifying mkcramfs in /etc/mkinitrd/mkinitrd.conf >> to build the initrd, then make sure that you have CRAMFS support built >> into the kernel. >> >> > > Debian openssi doesn't work that we use our of mkramdisk. so no need > to have cramfs support > again it is not /etc/mkinitrd it is /etc/mkinitrd-openssi/modules. > But then he has to get the latest initrd-tools from CVS. The linuxrc > call to load modules is recently added. I added the module to /etc/mkinitrd-openssi/modules but it doesn't work either. To load the module 8139too.o the only thing to do is to add 8139too to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after that the module is found in the directories of the ramdisk, but is not loaded. The errormessage that the NIC could not be found is still there. If I recompile the kernel with the 8139 support in it (and not as module) it works. But of course this isn't the best solution if you want to use the kernel with different hardware. But for the moment in order to test openSSI it is all right. I try to find a solution for an high available server (linux cluster) for medical image data for my diploma and I suppose that openSSI is a possibility a solution can base on. I have allready tested openMosix, Kimberlite and GridEngine from SUN. Andreas > > > -aneesh > > |
From: Aneesh K. KV <ane...@di...> - 2004-03-30 23:10:52
|
Andreas Roos wrote: > Aneesh Kumar KV wrote: > >> Chirag Kantharia wrote: >> >>> On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >>> | >mkinitrd --with=eepro100 --with=8139too .... >>> | Which version of mkinitrd do you use. It seems that my version >>> 1.161 | (2004/02/27) do not have an option --with >>> >>> I'm sorry; I recall now, that you're using Debian. For debian, please >>> update /etc/mkinitrd/modules with names of the relevant modules for >>> the NICs. If you're specifying mkcramfs in /etc/mkinitrd/mkinitrd.conf >>> to build the initrd, then make sure that you have CRAMFS support built >>> into the kernel. >>> >>> >> >> Debian openssi doesn't work that we use our of mkramdisk. so no need >> to have cramfs support >> again it is not /etc/mkinitrd it is /etc/mkinitrd-openssi/modules. >> But then he has to get the latest initrd-tools from CVS. The linuxrc >> call to load modules is recently added. > > > > I added the module to /etc/mkinitrd-openssi/modules but it doesn't > work either. To load the module 8139too.o the only thing to do is to add > > 8139too > > to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after that > the module is found in the directories of the ramdisk, but is not > loaded. The errormessage that the NIC could not be found is still > there. If I recompile the kernel with the 8139 support in it (and not > as module) it works. But of course this isn't the best solution if you > want to use the kernel with different hardware. But for the moment in > order to test openSSI it is all right. > > I try to find a solution for an high available server (linux cluster) > for medical image data for my diploma and I suppose that openSSI is a > possibility a solution can base on. I have allready tested openMosix, > Kimberlite and GridEngine from SUN. > I am not sure why it doesn't work for you. I tested module loading from initrd using fat.o and for me it worked fine. In the sense after the booting the cluster a lsmod showed the module is loaded. If you can get me some more info may i will be able to find what went wrong. Ok here is what you can do 1) create a ramdisk as you said above also by adding /sbin/lsmod to /etc/mkinitrd-openssi/exe 2) loop mount the ramdisk 3) see <mount_point>/linuxrc has a call /loadmodules 4) See <mount_point>/loadmodules has all the commands to load the modules. You should find modprobe <your network module> here 5) after the call /loadmodules insert call the shell( /bin/bash ) . This should give you a shell prompt 6) Boot with this initrd 7) run ifconfig and see whether network card is listing 8 ) get the output of /sbin/lsmod 9) try modprobe your_network_module 10) If it fails capture the error message > > -- ph: 603-884-5742 |
From: Chirag K. <chi...@hp...> - 2004-03-31 08:39:58
|
On Tue, Mar 30, 2004 at 09:48:09AM -0500, Aneesh Kumar KV wrote: | >work either. To load the module 8139too.o the only thing to do is to add | > | >8139too | > | >to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after that= =20 <snip> | I am not sure why it doesn't work for you. I tested module loading from= =20 | initrd using fat.o and for me it worked fine. In the sense after the=20 | booting the cluster a lsmod showed the module is loaded. If you can get= =20 | me some more info may i will be able to find what went wrong. Ok here=20 | is what you can do <snip> | 1) create a ramdisk as you said above also by adding /sbin/lsmod to=20 | /etc/mkinitrd-openssi/exe | 2) loop mount the ramdisk | 3) see <mount_point>/linuxrc has a call /loadmodules To start with, you could just see if 8139too module is included in the initrd (find /<mnt_point_of_initrd> -name 8139too\*) It might be just possible, that 8139too module is not present on your system, on which you are running mkinitrd. --=20 Chirag Kantharia, Industry Standard Servers Hewlett-Packard India Software Operations, Bangalore, India. |
From: Brian J. W. <Bri...@hp...> - 2004-03-31 20:17:36
|
Chirag Kantharia wrote: > To start with, you could just see if 8139too module is included in the > initrd (find /<mnt_point_of_initrd> -name 8139too\*) > > It might be just possible, that 8139too module is not present on your > system, on which you are running mkinitrd. The 8139too module depends on the mii module. Is mii getting included in the initrd? Brian |
From: Andreas R. <ro...@co...> - 2004-03-31 14:28:01
|
Aneesh Kumar KV wrote: > Andreas Roos wrote: > >> Aneesh Kumar KV wrote: >> >>> Chirag Kantharia wrote: >>> >>>> On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >>>> | >mkinitrd --with=eepro100 --with=8139too .... >>>> | Which version of mkinitrd do you use. It seems that my version >>>> 1.161 | (2004/02/27) do not have an option --with >>>> >>>> I'm sorry; I recall now, that you're using Debian. For debian, please >>>> update /etc/mkinitrd/modules with names of the relevant modules for >>>> the NICs. If you're specifying mkcramfs in /etc/mkinitrd/mkinitrd.conf >>>> to build the initrd, then make sure that you have CRAMFS support built >>>> into the kernel. >>>> >>>> >>> >>> Debian openssi doesn't work that we use our of mkramdisk. so no >>> need to have cramfs support >>> again it is not /etc/mkinitrd it is /etc/mkinitrd-openssi/modules. >>> But then he has to get the latest initrd-tools from CVS. The >>> linuxrc call to load modules is recently added. >> >> >> >> >> I added the module to /etc/mkinitrd-openssi/modules but it doesn't >> work either. To load the module 8139too.o the only thing to do is to add >> >> 8139too >> >> to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after >> that the module is found in the directories of the ramdisk, but is >> not loaded. The errormessage that the NIC could not be found is still >> there. If I recompile the kernel with the 8139 support in it (and not >> as module) it works. But of course this isn't the best solution if >> you want to use the kernel with different hardware. But for the >> moment in order to test openSSI it is all right. >> >> I try to find a solution for an high available server (linux cluster) >> for medical image data for my diploma and I suppose that openSSI is a >> possibility a solution can base on. I have allready tested openMosix, >> Kimberlite and GridEngine from SUN. >> > I am not sure why it doesn't work for you. I tested module loading > from initrd using fat.o and for me it worked fine. In the sense after > the booting the cluster a lsmod showed the module is loaded. If you > can get me some more info may i will be able to find what went wrong. > Ok here is what you can do > > 1) create a ramdisk as you said above also by adding /sbin/lsmod to > /etc/mkinitrd-openssi/exe > 2) loop mount the ramdisk > 3) see <mount_point>/linuxrc has a call /loadmodules > 4) See <mount_point>/loadmodules has all the commands to load the > modules. You should find modprobe <your network module> here > 5) after the call /loadmodules insert call the shell( /bin/bash ) . > This should give you a shell prompt > 6) Boot with this initrd > 7) run ifconfig and see whether network card is listing > 8 ) get the output of /sbin/lsmod > 9) try modprobe your_network_module > 10) If it fails capture the error message I will do that later. I suppose that loadmodules is not called. At the moment I am testing the failover function of openSSI and there for I have a question. Is it possible to use local drives in every initnode for the root filesystem? All the initnode have a drive with the "same" root - filesystem. The node that becomes the init node mounts its local partition and all the nodes use it. I have allready tried this and it works. The question I have is if it is possible to activate the root failover without an shared disk. After the initnode is disconnected from the cluster one of the other nodes take over and mount their local root filesystem. Of course it do not know any changes at the filesystem of the last initnode, but is it possible that the cluster can go on working ? Andreas > > > > >> >> > > |
From: Andreas R. <ro...@co...> - 2004-03-31 14:54:26
|
Andreas Roos wrote: > Aneesh Kumar KV wrote: > >> Andreas Roos wrote: >> >>> Aneesh Kumar KV wrote: >>> >>>> Chirag Kantharia wrote: >>>> >>>>> On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >>>>> | >mkinitrd --with=eepro100 --with=8139too .... >>>>> | Which version of mkinitrd do you use. It seems that my version >>>>> 1.161 | (2004/02/27) do not have an option --with >>>>> >>>>> I'm sorry; I recall now, that you're using Debian. For debian, please >>>>> update /etc/mkinitrd/modules with names of the relevant modules for >>>>> the NICs. If you're specifying mkcramfs in >>>>> /etc/mkinitrd/mkinitrd.conf >>>>> to build the initrd, then make sure that you have CRAMFS support >>>>> built >>>>> into the kernel. >>>>> >>>>> >>>> >>>> Debian openssi doesn't work that we use our of mkramdisk. so no >>>> need to have cramfs support >>>> again it is not /etc/mkinitrd it is /etc/mkinitrd-openssi/modules. >>>> But then he has to get the latest initrd-tools from CVS. The >>>> linuxrc call to load modules is recently added. >>> >>> >>> >>> >>> >>> I added the module to /etc/mkinitrd-openssi/modules but it doesn't >>> work either. To load the module 8139too.o the only thing to do is to >>> add >>> >>> 8139too >>> >>> to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after >>> that the module is found in the directories of the ramdisk, but is >>> not loaded. The errormessage that the NIC could not be found is >>> still there. If I recompile the kernel with the 8139 support in it >>> (and not as module) it works. But of course this isn't the best >>> solution if you want to use the kernel with different hardware. But >>> for the moment in order to test openSSI it is all right. >>> >>> I try to find a solution for an high available server (linux >>> cluster) for medical image data for my diploma and I suppose that >>> openSSI is a possibility a solution can base on. I have allready >>> tested openMosix, Kimberlite and GridEngine from SUN. >>> >> I am not sure why it doesn't work for you. I tested module loading >> from initrd using fat.o and for me it worked fine. In the sense after >> the booting the cluster a lsmod showed the module is loaded. If you >> can get me some more info may i will be able to find what went >> wrong. Ok here is what you can do >> >> 1) create a ramdisk as you said above also by adding /sbin/lsmod to >> /etc/mkinitrd-openssi/exe >> 2) loop mount the ramdisk >> 3) see <mount_point>/linuxrc has a call /loadmodules >> 4) See <mount_point>/loadmodules has all the commands to load the >> modules. You should find modprobe <your network module> here >> 5) after the call /loadmodules insert call the shell( /bin/bash ) . >> This should give you a shell prompt >> 6) Boot with this initrd >> 7) run ifconfig and see whether network card is listing >> 8 ) get the output of /sbin/lsmod >> 9) try modprobe your_network_module >> 10) If it fails capture the error message > I will do that later. I suppose that loadmodules is not called. At the moment I am testing the failover function of openSSI and there for I have a question. Is it possible to use local drives in every initnode for the root filesystem? All the initnode have a drive with the "same" root - filesystem. The node that becomes the init node mounts its local partition and all the nodes use it. I have allready tried this and it works. The question I have is if it is possible to activate the root failover without an shared disk. After the initnode is disconnected from the cluster one of the other nodes take over and mount their local root filesystem. Of course it do not know any changes at the filesystem of the last initnode, but is it possible that the cluster can go on working ? Andreas |
From: Andreas R. <ro...@co...> - 2004-03-31 14:33:08
|
Andreas Roos wrote: > Aneesh Kumar KV wrote: > >> Andreas Roos wrote: >> >>> Aneesh Kumar KV wrote: >>> >>>> Chirag Kantharia wrote: >>>> >>>>> On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >>>>> | >mkinitrd --with=eepro100 --with=8139too .... >>>>> | Which version of mkinitrd do you use. It seems that my version >>>>> 1.161 | (2004/02/27) do not have an option --with >>>>> >>>>> I'm sorry; I recall now, that you're using Debian. For debian, please >>>>> update /etc/mkinitrd/modules with names of the relevant modules for >>>>> the NICs. If you're specifying mkcramfs in >>>>> /etc/mkinitrd/mkinitrd.conf >>>>> to build the initrd, then make sure that you have CRAMFS support >>>>> built >>>>> into the kernel. >>>>> >>>>> >>>> >>>> Debian openssi doesn't work that we use our of mkramdisk. so no >>>> need to have cramfs support >>>> again it is not /etc/mkinitrd it is /etc/mkinitrd-openssi/modules. >>>> But then he has to get the latest initrd-tools from CVS. The >>>> linuxrc call to load modules is recently added. >>> >>> >>> >>> >>> >>> I added the module to /etc/mkinitrd-openssi/modules but it doesn't >>> work either. To load the module 8139too.o the only thing to do is to >>> add >>> >>> 8139too >>> >>> to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after >>> that the module is found in the directories of the ramdisk, but is >>> not loaded. The errormessage that the NIC could not be found is >>> still there. If I recompile the kernel with the 8139 support in it >>> (and not as module) it works. But of course this isn't the best >>> solution if you want to use the kernel with different hardware. But >>> for the moment in order to test openSSI it is all right. >>> >>> I try to find a solution for an high available server (linux >>> cluster) for medical image data for my diploma and I suppose that >>> openSSI is a possibility a solution can base on. I have allready >>> tested openMosix, Kimberlite and GridEngine from SUN. >>> >> I am not sure why it doesn't work for you. I tested module loading >> from initrd using fat.o and for me it worked fine. In the sense after >> the booting the cluster a lsmod showed the module is loaded. If you >> can get me some more info may i will be able to find what went >> wrong. Ok here is what you can do >> >> 1) create a ramdisk as you said above also by adding /sbin/lsmod to >> /etc/mkinitrd-openssi/exe >> 2) loop mount the ramdisk >> 3) see <mount_point>/linuxrc has a call /loadmodules >> 4) See <mount_point>/loadmodules has all the commands to load the >> modules. You should find modprobe <your network module> here >> 5) after the call /loadmodules insert call the shell( /bin/bash ) . >> This should give you a shell prompt >> 6) Boot with this initrd >> 7) run ifconfig and see whether network card is listing >> 8 ) get the output of /sbin/lsmod >> 9) try modprobe your_network_module >> 10) If it fails capture the error message > > > > I will do that later. I suppose that loadmodules is not called. At the > moment I am testing the failover function of openSSI and there for I > have a question. Is it possible to use local drives in every initnode > for the root filesystem? All the initnode have a drive with the "same" > root - filesystem. The node that becomes the init node mounts its > local partition and all the nodes use it. I have allready tried this > and it works. The question I have is if it is possible to activate the > root failover without an shared disk. After the initnode is > disconnected from the cluster one of the other nodes take over and > mount their local root filesystem. Of course it do not know any > changes at the filesystem of the last initnode, but is it possible > that the cluster can go on working ? > > Andreas PS: By the way. Is it possible to have failover for more than 2 nodes if the have all direct access to the devices (FibreChannel, Firewire etc.) ? LABEL=/test /test ext3 rw,node=1:2:3 1 2 > > > >> >> >> >> >>> >>> >> >> > > |
From: Aneesh K. KV <ane...@di...> - 2004-03-31 14:47:03
|
Andreas Roos wrote: > Andreas Roos wrote: > >> Aneesh Kumar KV wrote: >> >>> Andreas Roos wrote: >>> >>>> Aneesh Kumar KV wrote: >>>> >>>>> Chirag Kantharia wrote: >>>>> >>>>>> On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >>>>>> | >mkinitrd --with=eepro100 --with=8139too .... >>>>>> | Which version of mkinitrd do you use. It seems that my version >>>>>> 1.161 | (2004/02/27) do not have an option --with >>>>>> >>>>>> I'm sorry; I recall now, that you're using Debian. For debian, >>>>>> please >>>>>> update /etc/mkinitrd/modules with names of the relevant modules for >>>>>> the NICs. If you're specifying mkcramfs in >>>>>> /etc/mkinitrd/mkinitrd.conf >>>>>> to build the initrd, then make sure that you have CRAMFS support >>>>>> built >>>>>> into the kernel. >>>>>> >>>>>> >>>>> >>>>> Debian openssi doesn't work that we use our of mkramdisk. so no >>>>> need to have cramfs support >>>>> again it is not /etc/mkinitrd it is >>>>> /etc/mkinitrd-openssi/modules. But then he has to get the latest >>>>> initrd-tools from CVS. The linuxrc call to load modules is >>>>> recently added. >>>> >>>> >>>> >>>> >>>> >>>> >>>> I added the module to /etc/mkinitrd-openssi/modules but it doesn't >>>> work either. To load the module 8139too.o the only thing to do is >>>> to add >>>> >>>> 8139too >>>> >>>> to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after >>>> that the module is found in the directories of the ramdisk, but is >>>> not loaded. The errormessage that the NIC could not be found is >>>> still there. If I recompile the kernel with the 8139 support in it >>>> (and not as module) it works. But of course this isn't the best >>>> solution if you want to use the kernel with different hardware. But >>>> for the moment in order to test openSSI it is all right. >>>> >>>> I try to find a solution for an high available server (linux >>>> cluster) for medical image data for my diploma and I suppose that >>>> openSSI is a possibility a solution can base on. I have allready >>>> tested openMosix, Kimberlite and GridEngine from SUN. >>>> >>> I am not sure why it doesn't work for you. I tested module loading >>> from initrd using fat.o and for me it worked fine. In the sense >>> after the booting the cluster a lsmod showed the module is loaded. >>> If you can get me some more info may i will be able to find what >>> went wrong. Ok here is what you can do >>> >>> 1) create a ramdisk as you said above also by adding /sbin/lsmod >>> to /etc/mkinitrd-openssi/exe >>> 2) loop mount the ramdisk >>> 3) see <mount_point>/linuxrc has a call /loadmodules >>> 4) See <mount_point>/loadmodules has all the commands to load the >>> modules. You should find modprobe <your network module> here >>> 5) after the call /loadmodules insert call the shell( /bin/bash ) >>> . This should give you a shell prompt >>> 6) Boot with this initrd >>> 7) run ifconfig and see whether network card is listing >>> 8 ) get the output of /sbin/lsmod >>> 9) try modprobe your_network_module >>> 10) If it fails capture the error message >> >> >> >> >> I will do that later. I suppose that loadmodules is not called. At >> the moment I am testing the failover function of openSSI and there >> for I have a question. Is it possible to use local drives in every >> initnode for the root filesystem? All the initnode have a drive with >> the "same" root - filesystem. The node that becomes the init node >> mounts its local partition and all the nodes use it. I have allready >> tried this and it works. The question I have is if it is possible to >> activate the root failover without an shared disk. After the initnode >> is disconnected from the cluster one of the other nodes take over and >> mount their local root filesystem. Of course it do not know any >> changes at the filesystem of the last initnode, but is it possible >> that the cluster can go on working ? >> >> Andreas > > > > PS: By the way. Is it possible to have failover for more than 2 nodes > if the have all direct access to the devices (FibreChannel, Firewire > etc.) ? > > > LABEL=/test /test ext3 rw,node=1:2:3 1 2 BTW I haven't tested Debian packages with failover mode ( I don't have the hardware to test the same ). So i am not sure how it will work. But i have made sure all failover related changes to the scripts are done in the debian branches too. -aneesh -- ph: 603-884-5742 |
From: Brian J. W. <Bri...@hp...> - 2004-03-31 20:28:32
|
OpenSSI mailing list contributors- I don't want to single anyone out, but these deeply nested replies are a real drag. I have to scroll down a page or more before finding any new message text. When replying to messages, can you only include the context that you're responding to? Thanks, Brian Aneesh Kumar KV wrote: > Andreas Roos wrote: > >> Andreas Roos wrote: >> >>> Aneesh Kumar KV wrote: >>> >>>> Andreas Roos wrote: >>>> >>>>> Aneesh Kumar KV wrote: >>>>> >>>>>> Chirag Kantharia wrote: >>>>>> >>>>>>> On Tue, Mar 30, 2004 at 11:10:25AM +0200, Andreas Roos wrote: >>>>>>> | >mkinitrd --with=eepro100 --with=8139too .... >>>>>>> | Which version of mkinitrd do you use. It seems that my version >>>>>>> 1.161 | (2004/02/27) do not have an option --with >>>>>>> >>>>>>> I'm sorry; I recall now, that you're using Debian. For debian, >>>>>>> please >>>>>>> update /etc/mkinitrd/modules with names of the relevant modules for >>>>>>> the NICs. If you're specifying mkcramfs in >>>>>>> /etc/mkinitrd/mkinitrd.conf >>>>>>> to build the initrd, then make sure that you have CRAMFS support >>>>>>> built >>>>>>> into the kernel. >>>>>>> >>>>>>> >>>>>> >>>>>> Debian openssi doesn't work that we use our of mkramdisk. so no >>>>>> need to have cramfs support >>>>>> again it is not /etc/mkinitrd it is >>>>>> /etc/mkinitrd-openssi/modules. But then he has to get the latest >>>>>> initrd-tools from CVS. The linuxrc call to load modules is >>>>>> recently added. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I added the module to /etc/mkinitrd-openssi/modules but it doesn't >>>>> work either. To load the module 8139too.o the only thing to do is >>>>> to add >>>>> >>>>> 8139too >>>>> >>>>> to /etc/mkinitrd-openssi/modules and recreate the ramdisk ? after >>>>> that the module is found in the directories of the ramdisk, but is >>>>> not loaded. The errormessage that the NIC could not be found is >>>>> still there. If I recompile the kernel with the 8139 support in it >>>>> (and not as module) it works. But of course this isn't the best >>>>> solution if you want to use the kernel with different hardware. But >>>>> for the moment in order to test openSSI it is all right. >>>>> >>>>> I try to find a solution for an high available server (linux >>>>> cluster) for medical image data for my diploma and I suppose that >>>>> openSSI is a possibility a solution can base on. I have allready >>>>> tested openMosix, Kimberlite and GridEngine from SUN. >>>>> >>>> I am not sure why it doesn't work for you. I tested module loading >>>> from initrd using fat.o and for me it worked fine. In the sense >>>> after the booting the cluster a lsmod showed the module is loaded. >>>> If you can get me some more info may i will be able to find what >>>> went wrong. Ok here is what you can do >>>> >>>> 1) create a ramdisk as you said above also by adding /sbin/lsmod >>>> to /etc/mkinitrd-openssi/exe >>>> 2) loop mount the ramdisk >>>> 3) see <mount_point>/linuxrc has a call /loadmodules >>>> 4) See <mount_point>/loadmodules has all the commands to load the >>>> modules. You should find modprobe <your network module> here >>>> 5) after the call /loadmodules insert call the shell( /bin/bash ) >>>> . This should give you a shell prompt >>>> 6) Boot with this initrd >>>> 7) run ifconfig and see whether network card is listing >>>> 8 ) get the output of /sbin/lsmod >>>> 9) try modprobe your_network_module >>>> 10) If it fails capture the error message >>> >>> >>> >>> >>> >>> I will do that later. I suppose that loadmodules is not called. At >>> the moment I am testing the failover function of openSSI and there >>> for I have a question. Is it possible to use local drives in every >>> initnode for the root filesystem? All the initnode have a drive with >>> the "same" root - filesystem. The node that becomes the init node >>> mounts its local partition and all the nodes use it. I have allready >>> tried this and it works. The question I have is if it is possible to >>> activate the root failover without an shared disk. After the initnode >>> is disconnected from the cluster one of the other nodes take over and >>> mount their local root filesystem. Of course it do not know any >>> changes at the filesystem of the last initnode, but is it possible >>> that the cluster can go on working ? >>> >>> Andreas >> >> >> >> >> PS: By the way. Is it possible to have failover for more than 2 nodes >> if the have all direct access to the devices (FibreChannel, Firewire >> etc.) ? >> >> >> LABEL=/test /test ext3 rw,node=1:2:3 1 2 > > > > BTW I haven't tested Debian packages with failover mode ( I don't have > the hardware to test the same ). So i am not sure how it will work. But > i have made sure all failover related changes to the scripts are done in > the debian branches too. > > > > > > -aneesh > |
From: Brian J. W. <Bri...@hp...> - 2004-03-31 20:55:41
|
Andreas Roos wrote: > Is it possible to use local drives in every initnode > for the root filesystem? All the initnode have a drive with the "same" > root - filesystem. The node that becomes the init node mounts its local > partition and all the nodes use it. I have allready tried this and it > works. The question I have is if it is possible to activate the root > failover without an shared disk. After the initnode is disconnected from > the cluster one of the other nodes take over and mount their local root > filesystem. Of course it do not know any changes at the filesystem of > the last initnode, but is it possible that the cluster can go on working ? In theory this should work, but it's not terribly SSI to have different root filesystem after failover. Some apps might get upset after seeing the change and die (or do something worse). It might be better if you use the Distributed Replicated Block Device (DRBD) to mirror data between local disks on your potential initnodes. Jaideep Dharap has successfully tested OpenSSI failover with DRBD, but it required some hacking of the initrd to make it work. There's more work to be done before an easy-to-use OpenSSI/DRBD integration is ready to include in a future release. Hope this helps, Brian |
From: Aneesh K. KV <ane...@di...> - 2004-03-31 21:00:13
|
Brian J. Watson wrote: [..snip.. ] > In theory this should work, but it's not terribly SSI to have > different root filesystem after failover. Some apps might get upset > after seeing the change and die (or do something worse). > > It might be better if you use the Distributed Replicated Block Device > (DRBD) to mirror data between local disks on your potential initnodes. > Jaideep Dharap has successfully tested OpenSSI failover with DRBD, but > it required some hacking of the initrd to make it work. There's more > work to be done before an easy-to-use OpenSSI/DRBD integration is > ready to include in a future release. > I also remember that during migration in the reopening file path we check for inode number to be same, right ? Does any such restrication are there during failover ? -aneesh -- ph: 603-884-5742 |
From: David B. Z. <dav...@hp...> - 2004-03-31 21:49:58
|
Aneesh Kumar KV wrote: > > I also remember that during migration in the reopening file path we > check for inode number to be same, right ? Does any such restrication > are there during failover ? > > -aneesh Yes, failover uses inode numbers. Someone with an open file that is writing will end up writing data into a different file after a failover. If the failover doesn't OOPS the kernel, because it can't find an on-disk inode for a given inode number. -- David B. Zafman | Hewlett-Packard Company mailto:dav...@hp... | http://www.hp.com "Computer Science" is no more about computers than astronomy is about telescopes - E. W. Dijkstra |
From: Brian J. W. <Bri...@hp...> - 2004-03-31 22:37:54
|
Aneesh Kumar KV wrote: > I also remember that during migration in the reopening file path we > check for inode number to be same, right ? Does any such restrication > are there during failover ? Good point. I don't know if there's any code that ensures inodes are the same after failover, but there's going to be a serious mismatch between CFS' perception of the filesystem and the layout that actually exists on disk. This is potentially very dangerous. I retract my claim that this scheme should work in theory. Brian |
From: Jose R.V. <jrv...@cn...> - 2004-04-15 09:50:57
|
Brian J. Watson <Brian.J.Watson <at> hp.com> writes: > > Aneesh Kumar KV wrote: > > I also remember that during migration in the reopening file path we > > check for inode number to be same, right ? Does any such restrication > > are there during failover ? > > Good point. I don't know if there's any code that ensures inodes are the > same after failover, but there's going to be a serious mismatch between > CFS' perception of the filesystem and the layout that actually exists on > disk. This is potentially very dangerous. I retract my claim that this > scheme should work in theory. > I would really welcome a failover over local disks feature. Specially for cheap clusters with no shared disks or for dynamic clusters of user workstations. This should be feasible if the root filesystem is read-only and exactly mirrored on all failover nodes (a la cat /dev/hda1 > othernode:/dev/hda1). Indeed, you don't need /var to be on the root filesystem, and everything else might be read-only or on separate filesystems (say /usr/local), just like on live CD-ROM distributions like Knoppix. My reason for wanting the feature is that I don't trust shared disk roots: if the disk fails you have a single point of failure, and it may happen for a number of reasons (e.g. power or network failure). Replication of the root on other 'local' disks (which might be shared by a different set of nodes or purely local) would ensure survival on these circumstances. For instance, here I have two separate power circuits, and it's not uncommon for one of them to fail. I have two clusters, one on each circuit, and so, when one circuit fails there is always one cluster alive. I would like to consolidate both clusters into a single one but if the circuit with the root disk fails, then all nodes would fail, instead of just half of them as it is now. At one extreme each node might have a local, read-only copy of the root filesystem and everything else else-mounted, avoiding any need for network booting (which has its own concerns), booting from its local copy and relasing it as it joins a cluster defined in a map file and recalling its local root and running for election as new master if the master fails. This would be interesting for clusters made of user machines, where a master root disk failure might render lots of personal machines useless until the disk is recovered. In an extreme case each machine should be able to run stand-alone with access to local disks, hence guaranteeing maximal availability to its owner, and join the cluster as nodes become available again). That, for instance, is one of the things I like in Mosix: nodes may become totally isolated and still keep working, rejoining the cluster as nodes come back. Of course, remote resources aren't available during the isolated period, but the systems are still usable even if totally isolated (e.g. a laptop on a journey). |
From: Dave P. <dp...@w3...> - 2004-04-15 11:57:25
|
This sounds a whole lot like an AFS variant with the notable exception of the root AFS node(s) being the root cluster node(s). Kind Regards, -dsp > -----Original Message----- > From: ssi...@li... > [mailto:ssi...@li...]On Behalf Of Jose > R.Valverde > Sent: Thursday, April 15, 2004 5:15 AM > To: ssi...@li... > Subject: [SSI-devel] Re: Failover with local disks > > > Brian J. Watson <Brian.J.Watson <at> hp.com> writes: > > > > > Aneesh Kumar KV wrote: > > > I also remember that during migration in the reopening file path we > > > check for inode number to be same, right ? Does any such restrication > > > are there during failover ? > > > > Good point. I don't know if there's any code that ensures > inodes are the > > same after failover, but there's going to be a serious mismatch between > > CFS' perception of the filesystem and the layout that actually > exists on > > disk. This is potentially very dangerous. I retract my claim that this > > scheme should work in theory. > > > I would really welcome a failover over local disks feature. Specially for > cheap clusters with no shared disks or for dynamic clusters of user > workstations. This should be feasible if the root filesystem is > read-only and exactly mirrored on all failover nodes (a la cat /dev/hda1 > > othernode:/dev/hda1). > > Indeed, you don't need /var to be on the root filesystem, and everything > else might be read-only or on separate filesystems (say /usr/local), just > like on live CD-ROM distributions like Knoppix. > > My reason for wanting the feature is that I don't trust shared > disk roots: if > the disk fails you have a single point of failure, and it may > happen for a > number of reasons (e.g. power or network failure). Replication of > the root > on other 'local' disks (which might be shared by a different set > of nodes > or purely local) would ensure survival on these circumstances. > > For instance, here I have two separate power circuits, and it's > not uncommon > for one of them to fail. I have two clusters, one on each > circuit, and so, > when one circuit fails there is always one cluster alive. I would like to > consolidate both clusters into a single one but if the circuit with the > root disk fails, then all nodes would fail, instead of just half of them > as it is now. > > At one extreme each node might have a local, read-only copy of the root > filesystem and everything else else-mounted, avoiding any need for > network booting (which has its own concerns), booting from its local copy > and relasing it as it joins a cluster defined in a map file and recalling > its local root and running for election as new master if the master fails. > > This would be interesting for clusters made of user machines, where a > master root disk failure might render lots of personal machines useless > until the disk is recovered. In an extreme case each machine should be > able to run stand-alone with access to local disks, hence guaranteeing > maximal availability to its owner, and join the cluster as nodes become > available again). > > That, for instance, is one of the things I like in Mosix: nodes may become > totally isolated and still keep working, rejoining the cluster as nodes > come back. Of course, remote resources aren't available during the > isolated period, but the systems are still usable even if totally isolated > (e.g. a laptop on a journey). > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > > |
From: Mike F. <mf...@ma...> - 2004-04-16 23:28:39
|
Dave Paris wrote: > This sounds a whole lot like an AFS variant with the notable exception of > the root AFS node(s) being the root cluster node(s). > > Kind Regards, > -dsp Yes, check out OpenAFS. The only drawback of the stable 1.2 tree is a 2GB file size limit, but that has been fixed in the 1.3 dev tree. And it has a lot of activity on the mailing lists. Limited to 2.4 kernel though for now. Coda is interesting, but less actively maintained, but with a LGPL license. But OpenAFS is in more distributions, and it still has the 2GB file size limit with no fix in sight. Works with both the 2.4 and 2.6 kernel series though. Lustre, I have check into it yet. But I do know it's limited to a 2.4 kernel (but I have seen activity on LKML showing progress on 2.6 porting and integration). And I've heard that its semantics should be the same as a local filesystem, not sync on close (coda), sync on close or fsync (OpenAFS), or Sync on memory pressure (nfs). I'm researching this for a hot-failover system on my servers. Mike |