You can subscribe to this list here.
2006 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
(4) |
Sep
(4) |
Oct
|
Nov
(5) |
Dec
(9) |
2009 |
Jan
(3) |
Feb
(17) |
Mar
(11) |
Apr
(27) |
May
(16) |
Jun
(7) |
Jul
(3) |
Aug
(10) |
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
2010 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(3) |
Oct
(2) |
Nov
(2) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(18) |
Dec
(3) |
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2015 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Dan M. <dan...@or...> - 2009-04-29 22:33:22
|
If you can point out the place(s) in the initrd scripts that does the actual "mount" of the ocfs2 fs, I will try adding the "-o heartbeat=local" option to see if the problem goes away. Dan > -----Original Message----- > From: Dan Magenheimer > Sent: Wednesday, April 29, 2009 1:26 PM > To: Dan Magenheimer; Marc Grimme; > ope...@li... > Subject: RE: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > > > > I *think* an ocfs2 kernel patch is required, or possibly > > > > a "-o heartbeat=local" option might need to be specified. > > > This I didn't really understand. Cause for me it worked > > > without something like this. > > > > I think you are probably working with a xen-linux guest with > > kernel version 2.6.18? I think the patch that causes > > this problem came after linux-2.6.19 in this Linux commit: > > > > http://www.kernel.org/hg/index.cgi/linux-2.6/rev/710f6c6bd06c > > OK, here's what the ocfs2 team told me: > > The patch in Linux is a safety check for handling the case > where the heartbeat is on the same disk as the filesystem > data. If this is the case (and I think this is always true > for OSR), you must either use the mount.ocfs2 command > to do the mount (which does the right thing and notifies > the kernel about it) or must specify "-o heartbeat=local" if > you use the mount command. > > So since I am using an upstream Linux (2.6.29), I had to > workaround this by hacking an ocfs2 kernel source file. > > Dan > |
From: Dan M. <dan...@or...> - 2009-04-29 19:27:08
|
> > > I *think* an ocfs2 kernel patch is required, or possibly > > > a "-o heartbeat=local" option might need to be specified. > > This I didn't really understand. Cause for me it worked > > without something like this. > > I think you are probably working with a xen-linux guest with > kernel version 2.6.18? I think the patch that causes > this problem came after linux-2.6.19 in this Linux commit: > > http://www.kernel.org/hg/index.cgi/linux-2.6/rev/710f6c6bd06c OK, here's what the ocfs2 team told me: The patch in Linux is a safety check for handling the case where the heartbeat is on the same disk as the filesystem data. If this is the case (and I think this is always true for OSR), you must either use the mount.ocfs2 command to do the mount (which does the right thing and notifies the kernel about it) or must specify "-o heartbeat=local" if you use the mount command. So since I am using an upstream Linux (2.6.29), I had to workaround this by hacking an ocfs2 kernel source file. Dan |
From: Dan M. <dan...@or...> - 2009-04-29 13:27:13
|
> > FYI, the FATAL messages appear to be harmless and I have > > now successfully booted to a login prompt, then booted > > a second and third VM using the OSR also to login prompts! > Great. Sound good. > > May I ask what kind of FATAL messages you get? The ones > during module loading? Yes, they were module loading errors. Even though they report as FATAL, I think since all of those modules are compiled-in, they are not really fatal. > > I *think* an ocfs2 kernel patch is required, or possibly > > a "-o heartbeat=local" option might need to be specified. > This I didn't really understand. Cause for me it worked > without something like this. I think you are probably working with a xen-linux guest with kernel version 2.6.18? I think the patch that causes this problem came after linux-2.6.19 in this Linux commit: http://www.kernel.org/hg/index.cgi/linux-2.6/rev/710f6c6bd06c > > Let me reproduce everything and check on that. If you > > have any changed/newer rpm's I should use, please > > send them or send download URLs. Otherwise, I will > > use this list and hack in any changes needed. > I only know of the one with the distribution detection. This > will go upstream > with the next release (comoonics-bootimage-1.4-21). OK, if you could send me email when it is on download.atix.de, I would appreciate it. Thanks, Dan |
From: Marc G. <gr...@at...> - 2009-04-29 12:57:52
|
Hi Stefano, On Wednesday 29 April 2009 11:26:02 Stefano Elmopi wrote: > Hi Marc, > > I created a two-node OCFS2 cluster and now I need some information. > - To add a new IP address to the cluster, what should I do ? > I have tried to change the file cluster.conf but it seems that the > change will not be read. > I tried also the procedure for updating the version of the file > cluster.conf but nothing. You just need to update the cluster.conf with the new node and <com_info> section. Then build a new initrd: mkinitrd .. then create a new ocfscluster.conf: com-queryclusterconf convert ocfs2 > /etc/ocfs2/cluster.conf Do whatever ocfs2 needs to do when nodescount have been updated. Then create the cdsl infrastructure: cp -a /cluster/cdsl/default /cluster/cdsl/<newnodeid> Adapt the hostdependent files (ip address ..) Boot the third node. > > - What changes need to update the image file initrd ? > > - I saw that there were some updates in packages rpm, I can do update > my servers with the new packages, > or it might create problems ? It shouldn't ;-) These are preview rpms. So you might think about stabilizing your current approach and then update to new versions. Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Stefano E. <ste...@so...> - 2009-04-29 09:59:27
|
Hi Marc, I created a two-node OCFS2 cluster and now I need some information. - To add a new IP address to the cluster, what should I do ? I have tried to change the file cluster.conf but it seems that the change will not be read. I tried also the procedure for updating the version of the file cluster.conf but nothing. - What changes need to update the image file initrd ? - I saw that there were some updates in packages rpm, I can do update my servers with the new packages, or it might create problems ? Thanks Ing. Stefano Elmopi Gruppo Darco - Area ICT Sistemi Via Ostiense 131/L Corpo B, 00154 Roma cell. 3466147165 tel. 0657060500 email:ste...@so... |
From: Marc G. <gr...@at...> - 2009-04-29 06:12:48
|
On Wednesday 29 April 2009 00:21:25 Dan Magenheimer wrote: > FYI, the FATAL messages appear to be harmless and I have > now successfully booted to a login prompt, then booted > a second and third VM using the OSR also to login prompts! Great. Sound good. May I ask what kind of FATAL messages you get? The ones during module loading? > > Thanks very much for your help! > > I *think* an ocfs2 kernel patch is required, or possibly > a "-o heartbeat=local" option might need to be specified. This I didn't really understand. Cause for me it worked without something like this. > Let me reproduce everything and check on that. If you > have any changed/newer rpm's I should use, please > send them or send download URLs. Otherwise, I will > use this list and hack in any changes needed. I only know of the one with the distribution detection. This will go upstream with the next release (comoonics-bootimage-1.4-21). You're welcome. Marc. > > Thanks, > Dan > > comoonics-bootimage-1.4-19.noarch.rpm > comoonics-bootimage-extras-ocfs2-0.1-3.noarch.rpm > comoonics-bootimage-extras-xen-0.1-5.noarch.rpm > comoonics-bootimage-initscripts-1.4-9.rhel5.noarch.rpm > comoonics-bootimage-listfiles-1.3-8.el5.noarch.rpm > comoonics-bootimage-listfiles-all-0.1-5.noarch.rpm > comoonics-bootimage-listfiles-rhel-0.1-3.noarch.rpm > comoonics-bootimage-listfiles-rhel5-0.1-3.noarch.rpm > comoonics-cdsl-py-0.2-12.noarch.rpm > comoonics-cluster-py-0.1-17.noarch.rpm > comoonics-cs-py-0.1-56.noarch.rpm > comoonics-pythonosfix-py-0.1-2.noarch.rpm > SysVinit-comoonics-2.86-14.atix.1.i386.rpm -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Dan M. <dan...@or...> - 2009-04-28 22:22:04
|
FYI, the FATAL messages appear to be harmless and I have now successfully booted to a login prompt, then booted a second and third VM using the OSR also to login prompts! Thanks very much for your help! I *think* an ocfs2 kernel patch is required, or possibly a "-o heartbeat=local" option might need to be specified. Let me reproduce everything and check on that. If you have any changed/newer rpm's I should use, please send them or send download URLs. Otherwise, I will use this list and hack in any changes needed. Thanks, Dan comoonics-bootimage-1.4-19.noarch.rpm comoonics-bootimage-extras-ocfs2-0.1-3.noarch.rpm comoonics-bootimage-extras-xen-0.1-5.noarch.rpm comoonics-bootimage-initscripts-1.4-9.rhel5.noarch.rpm comoonics-bootimage-listfiles-1.3-8.el5.noarch.rpm comoonics-bootimage-listfiles-all-0.1-5.noarch.rpm comoonics-bootimage-listfiles-rhel-0.1-3.noarch.rpm comoonics-bootimage-listfiles-rhel5-0.1-3.noarch.rpm comoonics-cdsl-py-0.2-12.noarch.rpm comoonics-cluster-py-0.1-17.noarch.rpm comoonics-cs-py-0.1-56.noarch.rpm comoonics-pythonosfix-py-0.1-2.noarch.rpm SysVinit-comoonics-2.86-14.atix.1.i386.rpm > -----Original Message----- > From: Dan Magenheimer > Sent: Tuesday, April 28, 2009 12:01 PM > To: Marc Grimme; ope...@li... > Subject: RE: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > > you might try the patch or file itself attached. > > Yes, this patch seems to get the distribution set properly. > clutype still gets set to gfs... is that OK even if > I'm using ocfs2? > > > /etc/xen should only be tested for Dom0 "NOT" DomU. > > I think this problem goes away with the correct distro > setting. > > > > And why when "Loading modules for all found network cards" > > > do I get "FATAL: Module xennet not found"? I have xen > > > networking compiled into my kernel so there is no module > > > for it. Should this be fatal? > > Hm. Up to now it seems to be ;-) . Nowerdays I only saw > > kernels which are > > modularized. So this is a usecase where the errordetection > detects a > > nonexistant error. I'll have to think about it. > > Perhaps also check lib/modules/build/.config to see if the > config option is set to "=y"? > > I am now also seeing FATAL error reports when trying to > load the ocfs2 modules, scsi, dm, and others. I have all > of these compiled into my kernel. > > > -----Original Message----- > > From: Marc Grimme [mailto:gr...@at...] > > Sent: Tuesday, April 28, 2009 9:58 AM > > To: ope...@li... > > Cc: Dan Magenheimer > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > Hi Dan, > > you might try the patch or file itself attached. > > > > Do a > > --------------------------------X8---------------------------- > > --------- > > source /opt/atix/comoonics/bootimage/boot-scripts/etc/std-lib.sh > > sourceLibs /opt/atix/comoonics/bootimage/boot-script > > sourceRootfsLibs /opt/atix/comoonics/bootimage/boot-script > > getDistributionList > > --------------------------------X8---------------------------- > > --------- > > > > and let me know what the output is. > > > > Regards Marc. > > > > On Tuesday 28 April 2009 17:29:16 Dan Magenheimer wrote: > > > Hi Marc -- > > > > > > As you probably know, Oracle's Enterprise Linux (EL) > > > is a "clone" of Red Hat Enterprise Linux (RHEL) and is > > > essentially identical except for bug fixes. I don't > > > know if OSR needs to distinguish between EL and RHEL, > > > but I'm sure you know if they do. > > > > > > I'm told that you can distinguish between EL5 and RHEL5 > > > for RH/EL5ga, RH/EL5u1 and RH/EL5u2 because the file > > > /etc/redhat-release has > > > > > > "Enterprise Linux Enterprise Linux release 5.X (codename)" > > > > > > in EL but has > > > > > > "Red Hat Enterprise Linux release 5.x (different_codename)" > > > > > > in RHEL. > > > > > > (Sorry, I don't know all the codenames.) > > > > > > Also, the file /etc/enterprise-release exists on EL but > > > not on RHEL and has the same contents as /etc/redhat-release. > > > > > > HOWEVER, STARTING IN RH/EL5u3, this changes. The file > > > /etc/redhat-release is the SAME for EL5u3 and RHEL5u3: > > > > > > "Red Hat Enterprise Linux release 5.3 (codename)" > > > > > > And for EL5u3, the file /etc/enterprise-release is different > > > than /etc/redhat-release. For EL5u3, /etc/enterprise-release has > > > > > > "Enterprise Linux Enterprise Linux release 5.3 (codename)" > > > > > > but /etc/redhat-release has: > > > > > > "Red Hat Enterprise Linux release 5.3 (different_codename)" > > > > > > It looks to me like the OSR scripts already distinguish > > > between EL5 and RHEL5, so there is probably a bug somewhere. > > > > > > Hope that helps! > > > > > > Thanks, > > > Dan > > > > > > > -----Original Message----- > > > > From: Marc Grimme [mailto:gr...@at...] > > > > Sent: Tuesday, April 28, 2009 1:05 AM > > > > To: Dan Magenheimer > > > > Cc: ope...@li... > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > Candidate of > > > > comoonics-bootimage > > > > > > > > On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > > > > > Attached is the output from the messages command to > > > > > the rescueshell for a freshly created initrd (with > > > > > a smaller lib/modules). > > > > > > > > > > Some messages I see on the console which do not appear > > > > > in the "messages" output: > > > > > > > > > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > > > > > unknown_hardware_detect: command not found > > > > > > > > That's the wrong distribution detection. > > > > This function is called as ${distribution}_hardware_detect. > > > > Which will fail in > > > > your case. Send me a cat /etc/*-release and ls -1 > > > > /etc/*-release and I'll > > > > make a patch for it. > > > > > > > > > Loading modules for all found network cardsFATAL: Module > > > > > > > > xennet not found. > > > > > > > > > error: "xen.independent_wallclock" is an unknown key > > > > > > > > > > > -----Original Message----- > > > > > > From: Dan Magenheimer > > > > > > Sent: Monday, April 27, 2009 6:14 PM > > > > > > To: Marc Grimme; ope...@li... > > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > > Candidate of > > > > > > comoonics-bootimage > > > > > > > > > > > > > > > > > > Hi Marc -- > > > > > > > > > > > > Thanks for the reply. First, let me clarify that I am > > > > > > trying two different approaches: > > > > > > > > > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > > > > > a Xen-paravirtualized 2.6.29 kernel running as a > > > > > > guest on Xen-3.4.0), or > > > > > > (B) Build my own initrd and use your OSR scripts > > > > > > only after my initrd finishes. > > > > > > > > > > > > I couldn't get (A) to work... the initrd built using > > > > > > the howto was failing very early, so I decided to > > > > > > try with (B). I hoped that (B) would be easier because > > > > > > your code is very general to handle many different > > > > > > kinds of systems, and mine could be much more specific. > > > > > > > > > > > > HOWEVER, I just discovered one problem with (A). > > > > > > Your mkinitrd process builds a huge (200MB) initrd > > > > > > and there appears to be a BUG IN XEN that fails > > > > > > to load large initrds (larger than about 100MB)! > > > > > > > > > > > > Your initrd is so large because my lib/modules/2.6.29 > > > > > > is very large. If I delete that from initrd built > > > > > > using the howto, the initrd.gz is only about 24MB > > > > > > and I am able to boot and see the ATIX logo and > > > > > > it drops into the rescue shell (because I haven't > > > > > > specified the MAC-Addresses I think). However other > > > > > > boot errors complain about modules that are missing. > > > > > > I will try to build a kernel with fewer modules and > > > > > > see how that goes. > > > > > > > > > > > > Other feedback: > > > > > > > > > > > > I wonder if your script detecting "distribution" > > > > > > and "shortdistribution" are correct for all versions > > > > > > of RHEL and Oracle Enterprise Linux? I am getting > > > > > > "unknown" for both (from listparameters in the rescue > > > > > > shell), though I am booting Oracle Enterprise Linux 5 > > > > > > update 2. > > > > > > > > > > > > I also see that your test for detecting xen in xen-lib.sh > > > > > > is not very good. You may want to test for /proc/xen > > > > > > instead of (or in addition to) /etc/xen. > > > > > > > > > > > > And why when "Loading modules for all found network cards" > > > > > > do I get "FATAL: Module xennet not found"? I have xen > > > > > > networking compiled into my kernel so there is no module > > > > > > for it. Should this be fatal? > > > > > > > > > > > > > Ok. But still we need the mac adress for > detecting the nodes > > > > > > > identity in the /etc/cluster/cluster.conf. > > > > > > > > > > > > Is that really necessary? Ocfs2 only requires the > > > > > > node name, not the mac address. But I guess I can > > > > > > configure the mac address in the xen guest config file > > > > > > so I can live with this. > > > > > > > > > > > > Thanks, > > > > > > Dan > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Marc Grimme [mailto:gr...@at...] > > > > > > > Sent: Monday, April 27, 2009 12:52 AM > > > > > > > To: ope...@li... > > > > > > > Cc: Dan Magenheimer > > > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > > > Candidate of > > > > > > > comoonics-bootimage > > > > > > > > > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > > > > > Hi Marc -- > > > > > > > > > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > > > > > now much further along but have hit another roadblock. > > > > > > > > > > > > > > > > First, FYI, I am using a different approach then the > > > > > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > > > > > way that I want to use the shared root. Specifically, > > > > > > > > I am first building and booting a root ocfs2 filesystem, > > > > > > > > using my own kernel (2.6.29) and my own > initrd.img. With > > > > > > > > this (and before I install any OSR stuff), I am able > > > > > > > > to boot it as a Xen paravirtualized guest, using > > > > > > > > the Xen kernel= and ramdisk= config options; the root > > > > > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > > > > > > > > > No problem with this my osr-ocfs2 cluster is > running exactly > > > > > > > the same. No LVM, > > > > > > > direct boot via kernel and initrd. That should not > > be a problem. > > > > > > > > > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > > > > > IP address and hostname from the Xen config file. > > > > > > > > This all seems to work fine (for a single node). > > > > > > > > > > > > > > Ok. But still we need the mac adress for > detecting the nodes > > > > > > > identity in > > > > > > > the /etc/cluster/cluster.conf. > > > > > > > Also you might try to set the onboot flag in the > > cluster.conf > > > > > > > at the nic > > > > > > > config to "no": > > > > > > > <com_info> > > > > > > > .. > > > > > > > <eth name="eth0" mac="..." onboot="no"/> > > > > > > > .. > > > > > > > </com_info> > > > > > > > I didn't test it yet (it not YET in my testcases) but I'm > > > > > > > pretty confident it > > > > > > > should work. > > > > > > > > > > > > > > > Next, I install the OSR rpms directly in the running > > > > > > > > ocfs2-root guest, then shut it down. > > > > > > > > > > > > > > Ok. so far so good. > > > > > > > > > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > > > > > (that also has the OSR rpms installed) and follow > > > > > > > > the howto steps to create the cdsl infrastructure > > > > > > > > and links. Then I shut down the other guest. > > > > > > > > > > > > > > Could you recall the exact steps and outputs? > > > > > > > > > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > > > > > but it has problems. It appears that /var doesn't > > > > > > > > exist as I get many messages such as: > > > > > > > > > > > > > > The not mouting of /var is very strange. It should > > put you in > > > > > > > a rescue shell. > > > > > > > Then type messages and send me the output. > > > > > > > > > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or > > > > > > > > directory > > > > > > > > > > > That's just before it trys to boot. That's somehow to > > > > > > > > far advanced. > > > > > > > > > > > > and then the boot process seems to hang trying to > > start the > > > > > > > > System Logger. No, it just takes a very long time and > > > > > > > > eventually I get to a login prompt. (Or I can boot > > > > > > > > single-user mode and get the same error messages, but > > > > > > > > get to a bash prompt.) > > > > > > > > > > > > > > > > With a "ls -l /var", I see: > > > > > > > > > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > > > > > > > > > That's perfectly ok. > > > > > > > > > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > > > > > > > > > That's strange. > > > > > > > > > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > > > > > (with subdirectories 0 ... 7 and default) and each has > > > > > > > > an etc and a var subdirectory. /cluster/shared has > > > > > > > > a var subdirectory and a var/lib subdirectory. > > > > > > > > > > > > > > That's again perfectly ok.- > > > > > > > > > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > > > > > linked to /cluster but isn't. True? > > > > > > > > > > > > > > Right but it is not liked but bind mounted. That means: > > > > > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > > > > > but that's done in the initrd automatically so you > > should not > > > > > > > have to bother > > > > > > > about that. > > > > > > > > > > > > > > > One other thing I should mention... since my > cluster.conf > > > > > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > > > > > command during the cdsl setup steps, I used > cluster/cdsl/0 > > > > > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > > > > > > > > > How did you "use" that. That should be done automatically > > > > > > > shouldn't it? > > > > > > > > > > > > > > > Any ideas? Maybe your initrd creates some > necessary links > > > > > > > > and mine does not? (I tried booting with your initrd, > > > > > > > > but my ocfs2-root failed to mount giving a > kernel panic... > > > > > > > > have you tested with linux-2.6.29? The error message > > > > > > > > "Heartbeat has to be started to mount a read-write > > > > > > > > clustered device" looks like it comes from a somewhat > > > > > > > > recent ocfs2 kernel patch I found here: > > > > > > > > > > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > > > > > > > > > and I worked around it by mounting with -o > heartbeat=local) > > > > > > > > > > > > > > Sorry this is so long! > > > > > > > > > > > > How does your /etc/cluster/cluster.conf look like? > > > > > > > > > > > > -- > > > > > > Gruss / Regards, > > > > > > > > > > > > Marc Grimme > > > > > > http://www.atix.de/ > > http://www.open-sharedroot.org/ > > > > > > > > -------------------------------------------------------------- > > > > ------------- > > > > > > > > >--- Register Now & Save for Velocity, the Web Performance & > > > > > > > > Operations > > > > > > > > > Conference from O'Reilly Media. Velocity features a > full day of > > > > > expert-led, hands-on workshops and two days of sessions > > > > > > > > from industry > > > > > > > > > leaders in dedicated Performance & Operations tracks. Use > > > > > > > > code vel09scf > > > > > > > > > and Save an extra 15% before 5/3. > > http://p.sf.net/sfu/velocityconf > > > > > _______________________________________________ > > > > > Open-sharedroot-users mailing list > > > > > Ope...@li... > > > > > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > > > -- > > > > Gruss / Regards, > > > > > > > > Marc Grimme > > > > Phone: +49-89 452 3538-14 > > > > http://www.atix.de/ > http://www.open-sharedroot.org/ > > > > > > > > ATIX Informationstechnologie und Consulting AG | > > Einsteinstrasse 10 | > > > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > > > 168930, USt.-Id.: > > > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > > > Merz (Vors.) | > > > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > > > > > > > -------------------------------------------------------------- > > ------------- > > >--- Register Now & Save for Velocity, the Web Performance & > > Operations > > > Conference from O'Reilly Media. Velocity features a full day of > > > expert-led, hands-on workshops and two days of sessions > > from industry > > > leaders in dedicated Performance & Operations tracks. Use > > code vel09scf > > > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > > _______________________________________________ > > > Open-sharedroot-users mailing list > > > Ope...@li... > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > Phone: +49-89 452 3538-14 > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > ATIX Informationstechnologie und Consulting AG | > Einsteinstrasse 10 | > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > 168930, USt.-Id.: > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > Merz (Vors.) | > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > |
From: Dan M. <dan...@or...> - 2009-04-28 18:38:05
|
Also, attached is the output from mkinitrd... do any of these messages indicate a problem? Thanks, Dan > -----Original Message----- > From: Dan Magenheimer > Sent: Tuesday, April 28, 2009 12:01 PM > To: Marc Grimme; ope...@li... > Subject: RE: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > > you might try the patch or file itself attached. > > Yes, this patch seems to get the distribution set properly. > clutype still gets set to gfs... is that OK even if > I'm using ocfs2? > > > /etc/xen should only be tested for Dom0 "NOT" DomU. > > I think this problem goes away with the correct distro > setting. > > > > And why when "Loading modules for all found network cards" > > > do I get "FATAL: Module xennet not found"? I have xen > > > networking compiled into my kernel so there is no module > > > for it. Should this be fatal? > > Hm. Up to now it seems to be ;-) . Nowerdays I only saw > > kernels which are > > modularized. So this is a usecase where the errordetection > detects a > > nonexistant error. I'll have to think about it. > > Perhaps also check lib/modules/build/.config to see if the > config option is set to "=y"? > > I am now also seeing FATAL error reports when trying to > load the ocfs2 modules, scsi, dm, and others. I have all > of these compiled into my kernel. > > > -----Original Message----- > > From: Marc Grimme [mailto:gr...@at...] > > Sent: Tuesday, April 28, 2009 9:58 AM > > To: ope...@li... > > Cc: Dan Magenheimer > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > Hi Dan, > > you might try the patch or file itself attached. > > > > Do a > > --------------------------------X8---------------------------- > > --------- > > source /opt/atix/comoonics/bootimage/boot-scripts/etc/std-lib.sh > > sourceLibs /opt/atix/comoonics/bootimage/boot-script > > sourceRootfsLibs /opt/atix/comoonics/bootimage/boot-script > > getDistributionList > > --------------------------------X8---------------------------- > > --------- > > > > and let me know what the output is. > > > > Regards Marc. > > > > On Tuesday 28 April 2009 17:29:16 Dan Magenheimer wrote: > > > Hi Marc -- > > > > > > As you probably know, Oracle's Enterprise Linux (EL) > > > is a "clone" of Red Hat Enterprise Linux (RHEL) and is > > > essentially identical except for bug fixes. I don't > > > know if OSR needs to distinguish between EL and RHEL, > > > but I'm sure you know if they do. > > > > > > I'm told that you can distinguish between EL5 and RHEL5 > > > for RH/EL5ga, RH/EL5u1 and RH/EL5u2 because the file > > > /etc/redhat-release has > > > > > > "Enterprise Linux Enterprise Linux release 5.X (codename)" > > > > > > in EL but has > > > > > > "Red Hat Enterprise Linux release 5.x (different_codename)" > > > > > > in RHEL. > > > > > > (Sorry, I don't know all the codenames.) > > > > > > Also, the file /etc/enterprise-release exists on EL but > > > not on RHEL and has the same contents as /etc/redhat-release. > > > > > > HOWEVER, STARTING IN RH/EL5u3, this changes. The file > > > /etc/redhat-release is the SAME for EL5u3 and RHEL5u3: > > > > > > "Red Hat Enterprise Linux release 5.3 (codename)" > > > > > > And for EL5u3, the file /etc/enterprise-release is different > > > than /etc/redhat-release. For EL5u3, /etc/enterprise-release has > > > > > > "Enterprise Linux Enterprise Linux release 5.3 (codename)" > > > > > > but /etc/redhat-release has: > > > > > > "Red Hat Enterprise Linux release 5.3 (different_codename)" > > > > > > It looks to me like the OSR scripts already distinguish > > > between EL5 and RHEL5, so there is probably a bug somewhere. > > > > > > Hope that helps! > > > > > > Thanks, > > > Dan > > > > > > > -----Original Message----- > > > > From: Marc Grimme [mailto:gr...@at...] > > > > Sent: Tuesday, April 28, 2009 1:05 AM > > > > To: Dan Magenheimer > > > > Cc: ope...@li... > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > Candidate of > > > > comoonics-bootimage > > > > > > > > On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > > > > > Attached is the output from the messages command to > > > > > the rescueshell for a freshly created initrd (with > > > > > a smaller lib/modules). > > > > > > > > > > Some messages I see on the console which do not appear > > > > > in the "messages" output: > > > > > > > > > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > > > > > unknown_hardware_detect: command not found > > > > > > > > That's the wrong distribution detection. > > > > This function is called as ${distribution}_hardware_detect. > > > > Which will fail in > > > > your case. Send me a cat /etc/*-release and ls -1 > > > > /etc/*-release and I'll > > > > make a patch for it. > > > > > > > > > Loading modules for all found network cardsFATAL: Module > > > > > > > > xennet not found. > > > > > > > > > error: "xen.independent_wallclock" is an unknown key > > > > > > > > > > > -----Original Message----- > > > > > > From: Dan Magenheimer > > > > > > Sent: Monday, April 27, 2009 6:14 PM > > > > > > To: Marc Grimme; ope...@li... > > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > > Candidate of > > > > > > comoonics-bootimage > > > > > > > > > > > > > > > > > > Hi Marc -- > > > > > > > > > > > > Thanks for the reply. First, let me clarify that I am > > > > > > trying two different approaches: > > > > > > > > > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > > > > > a Xen-paravirtualized 2.6.29 kernel running as a > > > > > > guest on Xen-3.4.0), or > > > > > > (B) Build my own initrd and use your OSR scripts > > > > > > only after my initrd finishes. > > > > > > > > > > > > I couldn't get (A) to work... the initrd built using > > > > > > the howto was failing very early, so I decided to > > > > > > try with (B). I hoped that (B) would be easier because > > > > > > your code is very general to handle many different > > > > > > kinds of systems, and mine could be much more specific. > > > > > > > > > > > > HOWEVER, I just discovered one problem with (A). > > > > > > Your mkinitrd process builds a huge (200MB) initrd > > > > > > and there appears to be a BUG IN XEN that fails > > > > > > to load large initrds (larger than about 100MB)! > > > > > > > > > > > > Your initrd is so large because my lib/modules/2.6.29 > > > > > > is very large. If I delete that from initrd built > > > > > > using the howto, the initrd.gz is only about 24MB > > > > > > and I am able to boot and see the ATIX logo and > > > > > > it drops into the rescue shell (because I haven't > > > > > > specified the MAC-Addresses I think). However other > > > > > > boot errors complain about modules that are missing. > > > > > > I will try to build a kernel with fewer modules and > > > > > > see how that goes. > > > > > > > > > > > > Other feedback: > > > > > > > > > > > > I wonder if your script detecting "distribution" > > > > > > and "shortdistribution" are correct for all versions > > > > > > of RHEL and Oracle Enterprise Linux? I am getting > > > > > > "unknown" for both (from listparameters in the rescue > > > > > > shell), though I am booting Oracle Enterprise Linux 5 > > > > > > update 2. > > > > > > > > > > > > I also see that your test for detecting xen in xen-lib.sh > > > > > > is not very good. You may want to test for /proc/xen > > > > > > instead of (or in addition to) /etc/xen. > > > > > > > > > > > > And why when "Loading modules for all found network cards" > > > > > > do I get "FATAL: Module xennet not found"? I have xen > > > > > > networking compiled into my kernel so there is no module > > > > > > for it. Should this be fatal? > > > > > > > > > > > > > Ok. But still we need the mac adress for > detecting the nodes > > > > > > > identity in the /etc/cluster/cluster.conf. > > > > > > > > > > > > Is that really necessary? Ocfs2 only requires the > > > > > > node name, not the mac address. But I guess I can > > > > > > configure the mac address in the xen guest config file > > > > > > so I can live with this. > > > > > > > > > > > > Thanks, > > > > > > Dan > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Marc Grimme [mailto:gr...@at...] > > > > > > > Sent: Monday, April 27, 2009 12:52 AM > > > > > > > To: ope...@li... > > > > > > > Cc: Dan Magenheimer > > > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > > > Candidate of > > > > > > > comoonics-bootimage > > > > > > > > > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > > > > > Hi Marc -- > > > > > > > > > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > > > > > now much further along but have hit another roadblock. > > > > > > > > > > > > > > > > First, FYI, I am using a different approach then the > > > > > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > > > > > way that I want to use the shared root. Specifically, > > > > > > > > I am first building and booting a root ocfs2 filesystem, > > > > > > > > using my own kernel (2.6.29) and my own > initrd.img. With > > > > > > > > this (and before I install any OSR stuff), I am able > > > > > > > > to boot it as a Xen paravirtualized guest, using > > > > > > > > the Xen kernel= and ramdisk= config options; the root > > > > > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > > > > > > > > > No problem with this my osr-ocfs2 cluster is > running exactly > > > > > > > the same. No LVM, > > > > > > > direct boot via kernel and initrd. That should not > > be a problem. > > > > > > > > > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > > > > > IP address and hostname from the Xen config file. > > > > > > > > This all seems to work fine (for a single node). > > > > > > > > > > > > > > Ok. But still we need the mac adress for > detecting the nodes > > > > > > > identity in > > > > > > > the /etc/cluster/cluster.conf. > > > > > > > Also you might try to set the onboot flag in the > > cluster.conf > > > > > > > at the nic > > > > > > > config to "no": > > > > > > > <com_info> > > > > > > > .. > > > > > > > <eth name="eth0" mac="..." onboot="no"/> > > > > > > > .. > > > > > > > </com_info> > > > > > > > I didn't test it yet (it not YET in my testcases) but I'm > > > > > > > pretty confident it > > > > > > > should work. > > > > > > > > > > > > > > > Next, I install the OSR rpms directly in the running > > > > > > > > ocfs2-root guest, then shut it down. > > > > > > > > > > > > > > Ok. so far so good. > > > > > > > > > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > > > > > (that also has the OSR rpms installed) and follow > > > > > > > > the howto steps to create the cdsl infrastructure > > > > > > > > and links. Then I shut down the other guest. > > > > > > > > > > > > > > Could you recall the exact steps and outputs? > > > > > > > > > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > > > > > but it has problems. It appears that /var doesn't > > > > > > > > exist as I get many messages such as: > > > > > > > > > > > > > > The not mouting of /var is very strange. It should > > put you in > > > > > > > a rescue shell. > > > > > > > Then type messages and send me the output. > > > > > > > > > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or > > > > > > > > directory > > > > > > > > > > > That's just before it trys to boot. That's somehow to > > > > > > > > far advanced. > > > > > > > > > > > > and then the boot process seems to hang trying to > > start the > > > > > > > > System Logger. No, it just takes a very long time and > > > > > > > > eventually I get to a login prompt. (Or I can boot > > > > > > > > single-user mode and get the same error messages, but > > > > > > > > get to a bash prompt.) > > > > > > > > > > > > > > > > With a "ls -l /var", I see: > > > > > > > > > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > > > > > > > > > That's perfectly ok. > > > > > > > > > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > > > > > > > > > That's strange. > > > > > > > > > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > > > > > (with subdirectories 0 ... 7 and default) and each has > > > > > > > > an etc and a var subdirectory. /cluster/shared has > > > > > > > > a var subdirectory and a var/lib subdirectory. > > > > > > > > > > > > > > That's again perfectly ok.- > > > > > > > > > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > > > > > linked to /cluster but isn't. True? > > > > > > > > > > > > > > Right but it is not liked but bind mounted. That means: > > > > > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > > > > > but that's done in the initrd automatically so you > > should not > > > > > > > have to bother > > > > > > > about that. > > > > > > > > > > > > > > > One other thing I should mention... since my > cluster.conf > > > > > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > > > > > command during the cdsl setup steps, I used > cluster/cdsl/0 > > > > > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > > > > > > > > > How did you "use" that. That should be done automatically > > > > > > > shouldn't it? > > > > > > > > > > > > > > > Any ideas? Maybe your initrd creates some > necessary links > > > > > > > > and mine does not? (I tried booting with your initrd, > > > > > > > > but my ocfs2-root failed to mount giving a > kernel panic... > > > > > > > > have you tested with linux-2.6.29? The error message > > > > > > > > "Heartbeat has to be started to mount a read-write > > > > > > > > clustered device" looks like it comes from a somewhat > > > > > > > > recent ocfs2 kernel patch I found here: > > > > > > > > > > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > > > > > > > > > and I worked around it by mounting with -o > heartbeat=local) > > > > > > > > > > > > > > Sorry this is so long! > > > > > > > > > > > > How does your /etc/cluster/cluster.conf look like? > > > > > > > > > > > > -- > > > > > > Gruss / Regards, > > > > > > > > > > > > Marc Grimme > > > > > > http://www.atix.de/ > > http://www.open-sharedroot.org/ > > > > > > > > -------------------------------------------------------------- > > > > ------------- > > > > > > > > >--- Register Now & Save for Velocity, the Web Performance & > > > > > > > > Operations > > > > > > > > > Conference from O'Reilly Media. Velocity features a > full day of > > > > > expert-led, hands-on workshops and two days of sessions > > > > > > > > from industry > > > > > > > > > leaders in dedicated Performance & Operations tracks. Use > > > > > > > > code vel09scf > > > > > > > > > and Save an extra 15% before 5/3. > > http://p.sf.net/sfu/velocityconf > > > > > _______________________________________________ > > > > > Open-sharedroot-users mailing list > > > > > Ope...@li... > > > > > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > > > -- > > > > Gruss / Regards, > > > > > > > > Marc Grimme > > > > Phone: +49-89 452 3538-14 > > > > http://www.atix.de/ > http://www.open-sharedroot.org/ > > > > > > > > ATIX Informationstechnologie und Consulting AG | > > Einsteinstrasse 10 | > > > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > > > 168930, USt.-Id.: > > > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > > > Merz (Vors.) | > > > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > > > > > > > -------------------------------------------------------------- > > ------------- > > >--- Register Now & Save for Velocity, the Web Performance & > > Operations > > > Conference from O'Reilly Media. Velocity features a full day of > > > expert-led, hands-on workshops and two days of sessions > > from industry > > > leaders in dedicated Performance & Operations tracks. Use > > code vel09scf > > > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > > _______________________________________________ > > > Open-sharedroot-users mailing list > > > Ope...@li... > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > Phone: +49-89 452 3538-14 > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > ATIX Informationstechnologie und Consulting AG | > Einsteinstrasse 10 | > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > 168930, USt.-Id.: > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > Merz (Vors.) | > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > |
From: Dan M. <dan...@or...> - 2009-04-28 18:01:51
|
> you might try the patch or file itself attached. Yes, this patch seems to get the distribution set properly. clutype still gets set to gfs... is that OK even if I'm using ocfs2? > /etc/xen should only be tested for Dom0 "NOT" DomU. I think this problem goes away with the correct distro setting. > > And why when "Loading modules for all found network cards" > > do I get "FATAL: Module xennet not found"? I have xen > > networking compiled into my kernel so there is no module > > for it. Should this be fatal? > Hm. Up to now it seems to be ;-) . Nowerdays I only saw > kernels which are > modularized. So this is a usecase where the errordetection detects a > nonexistant error. I'll have to think about it. Perhaps also check lib/modules/build/.config to see if the config option is set to "=y"? I am now also seeing FATAL error reports when trying to load the ocfs2 modules, scsi, dm, and others. I have all of these compiled into my kernel. > -----Original Message----- > From: Marc Grimme [mailto:gr...@at...] > Sent: Tuesday, April 28, 2009 9:58 AM > To: ope...@li... > Cc: Dan Magenheimer > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > Hi Dan, > you might try the patch or file itself attached. > > Do a > --------------------------------X8---------------------------- > --------- > source /opt/atix/comoonics/bootimage/boot-scripts/etc/std-lib.sh > sourceLibs /opt/atix/comoonics/bootimage/boot-script > sourceRootfsLibs /opt/atix/comoonics/bootimage/boot-script > getDistributionList > --------------------------------X8---------------------------- > --------- > > and let me know what the output is. > > Regards Marc. > > On Tuesday 28 April 2009 17:29:16 Dan Magenheimer wrote: > > Hi Marc -- > > > > As you probably know, Oracle's Enterprise Linux (EL) > > is a "clone" of Red Hat Enterprise Linux (RHEL) and is > > essentially identical except for bug fixes. I don't > > know if OSR needs to distinguish between EL and RHEL, > > but I'm sure you know if they do. > > > > I'm told that you can distinguish between EL5 and RHEL5 > > for RH/EL5ga, RH/EL5u1 and RH/EL5u2 because the file > > /etc/redhat-release has > > > > "Enterprise Linux Enterprise Linux release 5.X (codename)" > > > > in EL but has > > > > "Red Hat Enterprise Linux release 5.x (different_codename)" > > > > in RHEL. > > > > (Sorry, I don't know all the codenames.) > > > > Also, the file /etc/enterprise-release exists on EL but > > not on RHEL and has the same contents as /etc/redhat-release. > > > > HOWEVER, STARTING IN RH/EL5u3, this changes. The file > > /etc/redhat-release is the SAME for EL5u3 and RHEL5u3: > > > > "Red Hat Enterprise Linux release 5.3 (codename)" > > > > And for EL5u3, the file /etc/enterprise-release is different > > than /etc/redhat-release. For EL5u3, /etc/enterprise-release has > > > > "Enterprise Linux Enterprise Linux release 5.3 (codename)" > > > > but /etc/redhat-release has: > > > > "Red Hat Enterprise Linux release 5.3 (different_codename)" > > > > It looks to me like the OSR scripts already distinguish > > between EL5 and RHEL5, so there is probably a bug somewhere. > > > > Hope that helps! > > > > Thanks, > > Dan > > > > > -----Original Message----- > > > From: Marc Grimme [mailto:gr...@at...] > > > Sent: Tuesday, April 28, 2009 1:05 AM > > > To: Dan Magenheimer > > > Cc: ope...@li... > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > Candidate of > > > comoonics-bootimage > > > > > > On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > > > > Attached is the output from the messages command to > > > > the rescueshell for a freshly created initrd (with > > > > a smaller lib/modules). > > > > > > > > Some messages I see on the console which do not appear > > > > in the "messages" output: > > > > > > > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > > > > unknown_hardware_detect: command not found > > > > > > That's the wrong distribution detection. > > > This function is called as ${distribution}_hardware_detect. > > > Which will fail in > > > your case. Send me a cat /etc/*-release and ls -1 > > > /etc/*-release and I'll > > > make a patch for it. > > > > > > > Loading modules for all found network cardsFATAL: Module > > > > > > xennet not found. > > > > > > > error: "xen.independent_wallclock" is an unknown key > > > > > > > > > -----Original Message----- > > > > > From: Dan Magenheimer > > > > > Sent: Monday, April 27, 2009 6:14 PM > > > > > To: Marc Grimme; ope...@li... > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > Candidate of > > > > > comoonics-bootimage > > > > > > > > > > > > > > > Hi Marc -- > > > > > > > > > > Thanks for the reply. First, let me clarify that I am > > > > > trying two different approaches: > > > > > > > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > > > > a Xen-paravirtualized 2.6.29 kernel running as a > > > > > guest on Xen-3.4.0), or > > > > > (B) Build my own initrd and use your OSR scripts > > > > > only after my initrd finishes. > > > > > > > > > > I couldn't get (A) to work... the initrd built using > > > > > the howto was failing very early, so I decided to > > > > > try with (B). I hoped that (B) would be easier because > > > > > your code is very general to handle many different > > > > > kinds of systems, and mine could be much more specific. > > > > > > > > > > HOWEVER, I just discovered one problem with (A). > > > > > Your mkinitrd process builds a huge (200MB) initrd > > > > > and there appears to be a BUG IN XEN that fails > > > > > to load large initrds (larger than about 100MB)! > > > > > > > > > > Your initrd is so large because my lib/modules/2.6.29 > > > > > is very large. If I delete that from initrd built > > > > > using the howto, the initrd.gz is only about 24MB > > > > > and I am able to boot and see the ATIX logo and > > > > > it drops into the rescue shell (because I haven't > > > > > specified the MAC-Addresses I think). However other > > > > > boot errors complain about modules that are missing. > > > > > I will try to build a kernel with fewer modules and > > > > > see how that goes. > > > > > > > > > > Other feedback: > > > > > > > > > > I wonder if your script detecting "distribution" > > > > > and "shortdistribution" are correct for all versions > > > > > of RHEL and Oracle Enterprise Linux? I am getting > > > > > "unknown" for both (from listparameters in the rescue > > > > > shell), though I am booting Oracle Enterprise Linux 5 > > > > > update 2. > > > > > > > > > > I also see that your test for detecting xen in xen-lib.sh > > > > > is not very good. You may want to test for /proc/xen > > > > > instead of (or in addition to) /etc/xen. > > > > > > > > > > And why when "Loading modules for all found network cards" > > > > > do I get "FATAL: Module xennet not found"? I have xen > > > > > networking compiled into my kernel so there is no module > > > > > for it. Should this be fatal? > > > > > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > > > identity in the /etc/cluster/cluster.conf. > > > > > > > > > > Is that really necessary? Ocfs2 only requires the > > > > > node name, not the mac address. But I guess I can > > > > > configure the mac address in the xen guest config file > > > > > so I can live with this. > > > > > > > > > > Thanks, > > > > > Dan > > > > > > > > > > > -----Original Message----- > > > > > > From: Marc Grimme [mailto:gr...@at...] > > > > > > Sent: Monday, April 27, 2009 12:52 AM > > > > > > To: ope...@li... > > > > > > Cc: Dan Magenheimer > > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > > Candidate of > > > > > > comoonics-bootimage > > > > > > > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > > > > Hi Marc -- > > > > > > > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > > > > now much further along but have hit another roadblock. > > > > > > > > > > > > > > First, FYI, I am using a different approach then the > > > > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > > > > way that I want to use the shared root. Specifically, > > > > > > > I am first building and booting a root ocfs2 filesystem, > > > > > > > using my own kernel (2.6.29) and my own initrd.img. With > > > > > > > this (and before I install any OSR stuff), I am able > > > > > > > to boot it as a Xen paravirtualized guest, using > > > > > > > the Xen kernel= and ramdisk= config options; the root > > > > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > > > > > > > No problem with this my osr-ocfs2 cluster is running exactly > > > > > > the same. No LVM, > > > > > > direct boot via kernel and initrd. That should not > be a problem. > > > > > > > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > > > > IP address and hostname from the Xen config file. > > > > > > > This all seems to work fine (for a single node). > > > > > > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > > > identity in > > > > > > the /etc/cluster/cluster.conf. > > > > > > Also you might try to set the onboot flag in the > cluster.conf > > > > > > at the nic > > > > > > config to "no": > > > > > > <com_info> > > > > > > .. > > > > > > <eth name="eth0" mac="..." onboot="no"/> > > > > > > .. > > > > > > </com_info> > > > > > > I didn't test it yet (it not YET in my testcases) but I'm > > > > > > pretty confident it > > > > > > should work. > > > > > > > > > > > > > Next, I install the OSR rpms directly in the running > > > > > > > ocfs2-root guest, then shut it down. > > > > > > > > > > > > Ok. so far so good. > > > > > > > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > > > > (that also has the OSR rpms installed) and follow > > > > > > > the howto steps to create the cdsl infrastructure > > > > > > > and links. Then I shut down the other guest. > > > > > > > > > > > > Could you recall the exact steps and outputs? > > > > > > > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > > > > but it has problems. It appears that /var doesn't > > > > > > > exist as I get many messages such as: > > > > > > > > > > > > The not mouting of /var is very strange. It should > put you in > > > > > > a rescue shell. > > > > > > Then type messages and send me the output. > > > > > > > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or > > > > > > directory > > > > > > > > > That's just before it trys to boot. That's somehow to > > > > > > far advanced. > > > > > > > > > > and then the boot process seems to hang trying to > start the > > > > > > > System Logger. No, it just takes a very long time and > > > > > > > eventually I get to a login prompt. (Or I can boot > > > > > > > single-user mode and get the same error messages, but > > > > > > > get to a bash prompt.) > > > > > > > > > > > > > > With a "ls -l /var", I see: > > > > > > > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > > > > > > > That's perfectly ok. > > > > > > > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > > > > > > > That's strange. > > > > > > > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > > > > (with subdirectories 0 ... 7 and default) and each has > > > > > > > an etc and a var subdirectory. /cluster/shared has > > > > > > > a var subdirectory and a var/lib subdirectory. > > > > > > > > > > > > That's again perfectly ok.- > > > > > > > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > > > > linked to /cluster but isn't. True? > > > > > > > > > > > > Right but it is not liked but bind mounted. That means: > > > > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > > > > but that's done in the initrd automatically so you > should not > > > > > > have to bother > > > > > > about that. > > > > > > > > > > > > > One other thing I should mention... since my cluster.conf > > > > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > > > > command during the cdsl setup steps, I used cluster/cdsl/0 > > > > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > > > > > > > How did you "use" that. That should be done automatically > > > > > > shouldn't it? > > > > > > > > > > > > > Any ideas? Maybe your initrd creates some necessary links > > > > > > > and mine does not? (I tried booting with your initrd, > > > > > > > but my ocfs2-root failed to mount giving a kernel panic... > > > > > > > have you tested with linux-2.6.29? The error message > > > > > > > "Heartbeat has to be started to mount a read-write > > > > > > > clustered device" looks like it comes from a somewhat > > > > > > > recent ocfs2 kernel patch I found here: > > > > > > > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > > > > > > > and I worked around it by mounting with -o heartbeat=local) > > > > > > > > > > > > Sorry this is so long! > > > > > > > > > > How does your /etc/cluster/cluster.conf look like? > > > > > > > > > > -- > > > > > Gruss / Regards, > > > > > > > > > > Marc Grimme > > > > > http://www.atix.de/ > http://www.open-sharedroot.org/ > > > > > > -------------------------------------------------------------- > > > ------------- > > > > > > >--- Register Now & Save for Velocity, the Web Performance & > > > > > > Operations > > > > > > > Conference from O'Reilly Media. Velocity features a full day of > > > > expert-led, hands-on workshops and two days of sessions > > > > > > from industry > > > > > > > leaders in dedicated Performance & Operations tracks. Use > > > > > > code vel09scf > > > > > > > and Save an extra 15% before 5/3. > http://p.sf.net/sfu/velocityconf > > > > _______________________________________________ > > > > Open-sharedroot-users mailing list > > > > Ope...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > -- > > > Gruss / Regards, > > > > > > Marc Grimme > > > Phone: +49-89 452 3538-14 > > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > > > ATIX Informationstechnologie und Consulting AG | > Einsteinstrasse 10 | > > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > > 168930, USt.-Id.: > > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > > Merz (Vors.) | > > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > > > > -------------------------------------------------------------- > ------------- > >--- Register Now & Save for Velocity, the Web Performance & > Operations > > Conference from O'Reilly Media. Velocity features a full day of > > expert-led, hands-on workshops and two days of sessions > from industry > > leaders in dedicated Performance & Operations tracks. Use > code vel09scf > > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > _______________________________________________ > > Open-sharedroot-users mailing list > > Ope...@li... > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > -- > Gruss / Regards, > > Marc Grimme > Phone: +49-89 452 3538-14 > http://www.atix.de/ http://www.open-sharedroot.org/ > > ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > 168930, USt.-Id.: > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > Merz (Vors.) | > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > |
From: Marc G. <gr...@at...> - 2009-04-28 15:57:56
|
Hi Dan, you might try the patch or file itself attached. Do a --------------------------------X8------------------------------------- source /opt/atix/comoonics/bootimage/boot-scripts/etc/std-lib.sh sourceLibs /opt/atix/comoonics/bootimage/boot-script sourceRootfsLibs /opt/atix/comoonics/bootimage/boot-script getDistributionList --------------------------------X8------------------------------------- and let me know what the output is. Regards Marc. On Tuesday 28 April 2009 17:29:16 Dan Magenheimer wrote: > Hi Marc -- > > As you probably know, Oracle's Enterprise Linux (EL) > is a "clone" of Red Hat Enterprise Linux (RHEL) and is > essentially identical except for bug fixes. I don't > know if OSR needs to distinguish between EL and RHEL, > but I'm sure you know if they do. > > I'm told that you can distinguish between EL5 and RHEL5 > for RH/EL5ga, RH/EL5u1 and RH/EL5u2 because the file > /etc/redhat-release has > > "Enterprise Linux Enterprise Linux release 5.X (codename)" > > in EL but has > > "Red Hat Enterprise Linux release 5.x (different_codename)" > > in RHEL. > > (Sorry, I don't know all the codenames.) > > Also, the file /etc/enterprise-release exists on EL but > not on RHEL and has the same contents as /etc/redhat-release. > > HOWEVER, STARTING IN RH/EL5u3, this changes. The file > /etc/redhat-release is the SAME for EL5u3 and RHEL5u3: > > "Red Hat Enterprise Linux release 5.3 (codename)" > > And for EL5u3, the file /etc/enterprise-release is different > than /etc/redhat-release. For EL5u3, /etc/enterprise-release has > > "Enterprise Linux Enterprise Linux release 5.3 (codename)" > > but /etc/redhat-release has: > > "Red Hat Enterprise Linux release 5.3 (different_codename)" > > It looks to me like the OSR scripts already distinguish > between EL5 and RHEL5, so there is probably a bug somewhere. > > Hope that helps! > > Thanks, > Dan > > > -----Original Message----- > > From: Marc Grimme [mailto:gr...@at...] > > Sent: Tuesday, April 28, 2009 1:05 AM > > To: Dan Magenheimer > > Cc: ope...@li... > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > > > Attached is the output from the messages command to > > > the rescueshell for a freshly created initrd (with > > > a smaller lib/modules). > > > > > > Some messages I see on the console which do not appear > > > in the "messages" output: > > > > > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > > > unknown_hardware_detect: command not found > > > > That's the wrong distribution detection. > > This function is called as ${distribution}_hardware_detect. > > Which will fail in > > your case. Send me a cat /etc/*-release and ls -1 > > /etc/*-release and I'll > > make a patch for it. > > > > > Loading modules for all found network cardsFATAL: Module > > > > xennet not found. > > > > > error: "xen.independent_wallclock" is an unknown key > > > > > > > -----Original Message----- > > > > From: Dan Magenheimer > > > > Sent: Monday, April 27, 2009 6:14 PM > > > > To: Marc Grimme; ope...@li... > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > Candidate of > > > > comoonics-bootimage > > > > > > > > > > > > Hi Marc -- > > > > > > > > Thanks for the reply. First, let me clarify that I am > > > > trying two different approaches: > > > > > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > > > a Xen-paravirtualized 2.6.29 kernel running as a > > > > guest on Xen-3.4.0), or > > > > (B) Build my own initrd and use your OSR scripts > > > > only after my initrd finishes. > > > > > > > > I couldn't get (A) to work... the initrd built using > > > > the howto was failing very early, so I decided to > > > > try with (B). I hoped that (B) would be easier because > > > > your code is very general to handle many different > > > > kinds of systems, and mine could be much more specific. > > > > > > > > HOWEVER, I just discovered one problem with (A). > > > > Your mkinitrd process builds a huge (200MB) initrd > > > > and there appears to be a BUG IN XEN that fails > > > > to load large initrds (larger than about 100MB)! > > > > > > > > Your initrd is so large because my lib/modules/2.6.29 > > > > is very large. If I delete that from initrd built > > > > using the howto, the initrd.gz is only about 24MB > > > > and I am able to boot and see the ATIX logo and > > > > it drops into the rescue shell (because I haven't > > > > specified the MAC-Addresses I think). However other > > > > boot errors complain about modules that are missing. > > > > I will try to build a kernel with fewer modules and > > > > see how that goes. > > > > > > > > Other feedback: > > > > > > > > I wonder if your script detecting "distribution" > > > > and "shortdistribution" are correct for all versions > > > > of RHEL and Oracle Enterprise Linux? I am getting > > > > "unknown" for both (from listparameters in the rescue > > > > shell), though I am booting Oracle Enterprise Linux 5 > > > > update 2. > > > > > > > > I also see that your test for detecting xen in xen-lib.sh > > > > is not very good. You may want to test for /proc/xen > > > > instead of (or in addition to) /etc/xen. > > > > > > > > And why when "Loading modules for all found network cards" > > > > do I get "FATAL: Module xennet not found"? I have xen > > > > networking compiled into my kernel so there is no module > > > > for it. Should this be fatal? > > > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > > identity in the /etc/cluster/cluster.conf. > > > > > > > > Is that really necessary? Ocfs2 only requires the > > > > node name, not the mac address. But I guess I can > > > > configure the mac address in the xen guest config file > > > > so I can live with this. > > > > > > > > Thanks, > > > > Dan > > > > > > > > > -----Original Message----- > > > > > From: Marc Grimme [mailto:gr...@at...] > > > > > Sent: Monday, April 27, 2009 12:52 AM > > > > > To: ope...@li... > > > > > Cc: Dan Magenheimer > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > Candidate of > > > > > comoonics-bootimage > > > > > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > > > Hi Marc -- > > > > > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > > > now much further along but have hit another roadblock. > > > > > > > > > > > > First, FYI, I am using a different approach then the > > > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > > > way that I want to use the shared root. Specifically, > > > > > > I am first building and booting a root ocfs2 filesystem, > > > > > > using my own kernel (2.6.29) and my own initrd.img. With > > > > > > this (and before I install any OSR stuff), I am able > > > > > > to boot it as a Xen paravirtualized guest, using > > > > > > the Xen kernel= and ramdisk= config options; the root > > > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > > > > > No problem with this my osr-ocfs2 cluster is running exactly > > > > > the same. No LVM, > > > > > direct boot via kernel and initrd. That should not be a problem. > > > > > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > > > IP address and hostname from the Xen config file. > > > > > > This all seems to work fine (for a single node). > > > > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > > identity in > > > > > the /etc/cluster/cluster.conf. > > > > > Also you might try to set the onboot flag in the cluster.conf > > > > > at the nic > > > > > config to "no": > > > > > <com_info> > > > > > .. > > > > > <eth name="eth0" mac="..." onboot="no"/> > > > > > .. > > > > > </com_info> > > > > > I didn't test it yet (it not YET in my testcases) but I'm > > > > > pretty confident it > > > > > should work. > > > > > > > > > > > Next, I install the OSR rpms directly in the running > > > > > > ocfs2-root guest, then shut it down. > > > > > > > > > > Ok. so far so good. > > > > > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > > > (that also has the OSR rpms installed) and follow > > > > > > the howto steps to create the cdsl infrastructure > > > > > > and links. Then I shut down the other guest. > > > > > > > > > > Could you recall the exact steps and outputs? > > > > > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > > > but it has problems. It appears that /var doesn't > > > > > > exist as I get many messages such as: > > > > > > > > > > The not mouting of /var is very strange. It should put you in > > > > > a rescue shell. > > > > > Then type messages and send me the output. > > > > > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or > > > > directory > > > > > > > That's just before it trys to boot. That's somehow to > > > > far advanced. > > > > > > > > and then the boot process seems to hang trying to start the > > > > > > System Logger. No, it just takes a very long time and > > > > > > eventually I get to a login prompt. (Or I can boot > > > > > > single-user mode and get the same error messages, but > > > > > > get to a bash prompt.) > > > > > > > > > > > > With a "ls -l /var", I see: > > > > > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > > > > > That's perfectly ok. > > > > > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > > > > > That's strange. > > > > > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > > > (with subdirectories 0 ... 7 and default) and each has > > > > > > an etc and a var subdirectory. /cluster/shared has > > > > > > a var subdirectory and a var/lib subdirectory. > > > > > > > > > > That's again perfectly ok.- > > > > > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > > > linked to /cluster but isn't. True? > > > > > > > > > > Right but it is not liked but bind mounted. That means: > > > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > > > but that's done in the initrd automatically so you should not > > > > > have to bother > > > > > about that. > > > > > > > > > > > One other thing I should mention... since my cluster.conf > > > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > > > command during the cdsl setup steps, I used cluster/cdsl/0 > > > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > > > > > How did you "use" that. That should be done automatically > > > > > shouldn't it? > > > > > > > > > > > Any ideas? Maybe your initrd creates some necessary links > > > > > > and mine does not? (I tried booting with your initrd, > > > > > > but my ocfs2-root failed to mount giving a kernel panic... > > > > > > have you tested with linux-2.6.29? The error message > > > > > > "Heartbeat has to be started to mount a read-write > > > > > > clustered device" looks like it comes from a somewhat > > > > > > recent ocfs2 kernel patch I found here: > > > > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > > > > > and I worked around it by mounting with -o heartbeat=local) > > > > > > > > > > Sorry this is so long! > > > > > > > > How does your /etc/cluster/cluster.conf look like? > > > > > > > > -- > > > > Gruss / Regards, > > > > > > > > Marc Grimme > > > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > -------------------------------------------------------------- > > ------------- > > > > >--- Register Now & Save for Velocity, the Web Performance & > > > > Operations > > > > > Conference from O'Reilly Media. Velocity features a full day of > > > expert-led, hands-on workshops and two days of sessions > > > > from industry > > > > > leaders in dedicated Performance & Operations tracks. Use > > > > code vel09scf > > > > > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > > _______________________________________________ > > > Open-sharedroot-users mailing list > > > Ope...@li... > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > Phone: +49-89 452 3538-14 > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > 168930, USt.-Id.: > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > Merz (Vors.) | > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > --------------------------------------------------------------------------- >--- Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > _______________________________________________ > Open-sharedroot-users mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) | Vorsitzender des Aufsichtsrats: Dr. Martin Buss |
From: Dan M. <dan...@or...> - 2009-04-28 15:30:04
|
Hi Marc -- As you probably know, Oracle's Enterprise Linux (EL) is a "clone" of Red Hat Enterprise Linux (RHEL) and is essentially identical except for bug fixes. I don't know if OSR needs to distinguish between EL and RHEL, but I'm sure you know if they do. I'm told that you can distinguish between EL5 and RHEL5 for RH/EL5ga, RH/EL5u1 and RH/EL5u2 because the file /etc/redhat-release has "Enterprise Linux Enterprise Linux release 5.X (codename)" in EL but has "Red Hat Enterprise Linux release 5.x (different_codename)" in RHEL. (Sorry, I don't know all the codenames.) Also, the file /etc/enterprise-release exists on EL but not on RHEL and has the same contents as /etc/redhat-release. HOWEVER, STARTING IN RH/EL5u3, this changes. The file /etc/redhat-release is the SAME for EL5u3 and RHEL5u3: "Red Hat Enterprise Linux release 5.3 (codename)" And for EL5u3, the file /etc/enterprise-release is different than /etc/redhat-release. For EL5u3, /etc/enterprise-release has "Enterprise Linux Enterprise Linux release 5.3 (codename)" but /etc/redhat-release has: "Red Hat Enterprise Linux release 5.3 (different_codename)" It looks to me like the OSR scripts already distinguish between EL5 and RHEL5, so there is probably a bug somewhere. Hope that helps! Thanks, Dan > -----Original Message----- > From: Marc Grimme [mailto:gr...@at...] > Sent: Tuesday, April 28, 2009 1:05 AM > To: Dan Magenheimer > Cc: ope...@li... > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > > Attached is the output from the messages command to > > the rescueshell for a freshly created initrd (with > > a smaller lib/modules). > > > > Some messages I see on the console which do not appear > > in the "messages" output: > > > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > > unknown_hardware_detect: command not found > That's the wrong distribution detection. > This function is called as ${distribution}_hardware_detect. > Which will fail in > your case. Send me a cat /etc/*-release and ls -1 > /etc/*-release and I'll > make a patch for it. > > > > Loading modules for all found network cardsFATAL: Module > xennet not found. > > > > error: "xen.independent_wallclock" is an unknown key > > > > > -----Original Message----- > > > From: Dan Magenheimer > > > Sent: Monday, April 27, 2009 6:14 PM > > > To: Marc Grimme; ope...@li... > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > Candidate of > > > comoonics-bootimage > > > > > > > > > Hi Marc -- > > > > > > Thanks for the reply. First, let me clarify that I am > > > trying two different approaches: > > > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > > a Xen-paravirtualized 2.6.29 kernel running as a > > > guest on Xen-3.4.0), or > > > (B) Build my own initrd and use your OSR scripts > > > only after my initrd finishes. > > > > > > I couldn't get (A) to work... the initrd built using > > > the howto was failing very early, so I decided to > > > try with (B). I hoped that (B) would be easier because > > > your code is very general to handle many different > > > kinds of systems, and mine could be much more specific. > > > > > > HOWEVER, I just discovered one problem with (A). > > > Your mkinitrd process builds a huge (200MB) initrd > > > and there appears to be a BUG IN XEN that fails > > > to load large initrds (larger than about 100MB)! > > > > > > Your initrd is so large because my lib/modules/2.6.29 > > > is very large. If I delete that from initrd built > > > using the howto, the initrd.gz is only about 24MB > > > and I am able to boot and see the ATIX logo and > > > it drops into the rescue shell (because I haven't > > > specified the MAC-Addresses I think). However other > > > boot errors complain about modules that are missing. > > > I will try to build a kernel with fewer modules and > > > see how that goes. > > > > > > Other feedback: > > > > > > I wonder if your script detecting "distribution" > > > and "shortdistribution" are correct for all versions > > > of RHEL and Oracle Enterprise Linux? I am getting > > > "unknown" for both (from listparameters in the rescue > > > shell), though I am booting Oracle Enterprise Linux 5 > > > update 2. > > > > > > I also see that your test for detecting xen in xen-lib.sh > > > is not very good. You may want to test for /proc/xen > > > instead of (or in addition to) /etc/xen. > > > > > > And why when "Loading modules for all found network cards" > > > do I get "FATAL: Module xennet not found"? I have xen > > > networking compiled into my kernel so there is no module > > > for it. Should this be fatal? > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > identity in the /etc/cluster/cluster.conf. > > > > > > Is that really necessary? Ocfs2 only requires the > > > node name, not the mac address. But I guess I can > > > configure the mac address in the xen guest config file > > > so I can live with this. > > > > > > Thanks, > > > Dan > > > > > > > -----Original Message----- > > > > From: Marc Grimme [mailto:gr...@at...] > > > > Sent: Monday, April 27, 2009 12:52 AM > > > > To: ope...@li... > > > > Cc: Dan Magenheimer > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > Candidate of > > > > comoonics-bootimage > > > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > > Hi Marc -- > > > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > > now much further along but have hit another roadblock. > > > > > > > > > > First, FYI, I am using a different approach then the > > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > > way that I want to use the shared root. Specifically, > > > > > I am first building and booting a root ocfs2 filesystem, > > > > > using my own kernel (2.6.29) and my own initrd.img. With > > > > > this (and before I install any OSR stuff), I am able > > > > > to boot it as a Xen paravirtualized guest, using > > > > > the Xen kernel= and ramdisk= config options; the root > > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > > > No problem with this my osr-ocfs2 cluster is running exactly > > > > the same. No LVM, > > > > direct boot via kernel and initrd. That should not be a problem. > > > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > > IP address and hostname from the Xen config file. > > > > > This all seems to work fine (for a single node). > > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > identity in > > > > the /etc/cluster/cluster.conf. > > > > Also you might try to set the onboot flag in the cluster.conf > > > > at the nic > > > > config to "no": > > > > <com_info> > > > > .. > > > > <eth name="eth0" mac="..." onboot="no"/> > > > > .. > > > > </com_info> > > > > I didn't test it yet (it not YET in my testcases) but I'm > > > > pretty confident it > > > > should work. > > > > > > > > > Next, I install the OSR rpms directly in the running > > > > > ocfs2-root guest, then shut it down. > > > > > > > > Ok. so far so good. > > > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > > (that also has the OSR rpms installed) and follow > > > > > the howto steps to create the cdsl infrastructure > > > > > and links. Then I shut down the other guest. > > > > > > > > Could you recall the exact steps and outputs? > > > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > > but it has problems. It appears that /var doesn't > > > > > exist as I get many messages such as: > > > > > > > > The not mouting of /var is very strange. It should put you in > > > > a rescue shell. > > > > Then type messages and send me the output. > > > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or > directory > > > > > > > > That's just before it trys to boot. That's somehow to > far advanced. > > > > > > > > > and then the boot process seems to hang trying to start the > > > > > System Logger. No, it just takes a very long time and > > > > > eventually I get to a login prompt. (Or I can boot > > > > > single-user mode and get the same error messages, but > > > > > get to a bash prompt.) > > > > > > > > > > With a "ls -l /var", I see: > > > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > > > That's perfectly ok. > > > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > > > That's strange. > > > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > > (with subdirectories 0 ... 7 and default) and each has > > > > > an etc and a var subdirectory. /cluster/shared has > > > > > a var subdirectory and a var/lib subdirectory. > > > > > > > > That's again perfectly ok.- > > > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > > linked to /cluster but isn't. True? > > > > > > > > Right but it is not liked but bind mounted. That means: > > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > > but that's done in the initrd automatically so you should not > > > > have to bother > > > > about that. > > > > > > > > > One other thing I should mention... since my cluster.conf > > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > > command during the cdsl setup steps, I used cluster/cdsl/0 > > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > > > How did you "use" that. That should be done automatically > > > > shouldn't it? > > > > > > > > > Any ideas? Maybe your initrd creates some necessary links > > > > > and mine does not? (I tried booting with your initrd, > > > > > but my ocfs2-root failed to mount giving a kernel panic... > > > > > have you tested with linux-2.6.29? The error message > > > > > "Heartbeat has to be started to mount a read-write > > > > > clustered device" looks like it comes from a somewhat > > > > > recent ocfs2 kernel patch I found here: > > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > > > and I worked around it by mounting with -o heartbeat=local) > > > > > > > > Sorry this is so long! > > > > > > How does your /etc/cluster/cluster.conf look like? > > > > > > -- > > > Gruss / Regards, > > > > > > Marc Grimme > > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > > -------------------------------------------------------------- > ------------- > >--- Register Now & Save for Velocity, the Web Performance & > Operations > > Conference from O'Reilly Media. Velocity features a full day of > > expert-led, hands-on workshops and two days of sessions > from industry > > leaders in dedicated Performance & Operations tracks. Use > code vel09scf > > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > _______________________________________________ > > Open-sharedroot-users mailing list > > Ope...@li... > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > -- > Gruss / Regards, > > Marc Grimme > Phone: +49-89 452 3538-14 > http://www.atix.de/ http://www.open-sharedroot.org/ > > ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > 168930, USt.-Id.: > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > Merz (Vors.) | > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > > |
From: Marc G. <gr...@at...> - 2009-04-28 07:05:28
|
On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > Attached is the output from the messages command to > the rescueshell for a freshly created initrd (with > a smaller lib/modules). > > Some messages I see on the console which do not appear > in the "messages" output: > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > unknown_hardware_detect: command not found That's the wrong distribution detection. This function is called as ${distribution}_hardware_detect. Which will fail in your case. Send me a cat /etc/*-release and ls -1 /etc/*-release and I'll make a patch for it. > > Loading modules for all found network cardsFATAL: Module xennet not found. > > error: "xen.independent_wallclock" is an unknown key > > > -----Original Message----- > > From: Dan Magenheimer > > Sent: Monday, April 27, 2009 6:14 PM > > To: Marc Grimme; ope...@li... > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > Hi Marc -- > > > > Thanks for the reply. First, let me clarify that I am > > trying two different approaches: > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > a Xen-paravirtualized 2.6.29 kernel running as a > > guest on Xen-3.4.0), or > > (B) Build my own initrd and use your OSR scripts > > only after my initrd finishes. > > > > I couldn't get (A) to work... the initrd built using > > the howto was failing very early, so I decided to > > try with (B). I hoped that (B) would be easier because > > your code is very general to handle many different > > kinds of systems, and mine could be much more specific. > > > > HOWEVER, I just discovered one problem with (A). > > Your mkinitrd process builds a huge (200MB) initrd > > and there appears to be a BUG IN XEN that fails > > to load large initrds (larger than about 100MB)! > > > > Your initrd is so large because my lib/modules/2.6.29 > > is very large. If I delete that from initrd built > > using the howto, the initrd.gz is only about 24MB > > and I am able to boot and see the ATIX logo and > > it drops into the rescue shell (because I haven't > > specified the MAC-Addresses I think). However other > > boot errors complain about modules that are missing. > > I will try to build a kernel with fewer modules and > > see how that goes. > > > > Other feedback: > > > > I wonder if your script detecting "distribution" > > and "shortdistribution" are correct for all versions > > of RHEL and Oracle Enterprise Linux? I am getting > > "unknown" for both (from listparameters in the rescue > > shell), though I am booting Oracle Enterprise Linux 5 > > update 2. > > > > I also see that your test for detecting xen in xen-lib.sh > > is not very good. You may want to test for /proc/xen > > instead of (or in addition to) /etc/xen. > > > > And why when "Loading modules for all found network cards" > > do I get "FATAL: Module xennet not found"? I have xen > > networking compiled into my kernel so there is no module > > for it. Should this be fatal? > > > > > Ok. But still we need the mac adress for detecting the nodes > > > identity in the /etc/cluster/cluster.conf. > > > > Is that really necessary? Ocfs2 only requires the > > node name, not the mac address. But I guess I can > > configure the mac address in the xen guest config file > > so I can live with this. > > > > Thanks, > > Dan > > > > > -----Original Message----- > > > From: Marc Grimme [mailto:gr...@at...] > > > Sent: Monday, April 27, 2009 12:52 AM > > > To: ope...@li... > > > Cc: Dan Magenheimer > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > Candidate of > > > comoonics-bootimage > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > Hi Marc -- > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > now much further along but have hit another roadblock. > > > > > > > > First, FYI, I am using a different approach then the > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > way that I want to use the shared root. Specifically, > > > > I am first building and booting a root ocfs2 filesystem, > > > > using my own kernel (2.6.29) and my own initrd.img. With > > > > this (and before I install any OSR stuff), I am able > > > > to boot it as a Xen paravirtualized guest, using > > > > the Xen kernel= and ramdisk= config options; the root > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > No problem with this my osr-ocfs2 cluster is running exactly > > > the same. No LVM, > > > direct boot via kernel and initrd. That should not be a problem. > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > IP address and hostname from the Xen config file. > > > > This all seems to work fine (for a single node). > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > identity in > > > the /etc/cluster/cluster.conf. > > > Also you might try to set the onboot flag in the cluster.conf > > > at the nic > > > config to "no": > > > <com_info> > > > .. > > > <eth name="eth0" mac="..." onboot="no"/> > > > .. > > > </com_info> > > > I didn't test it yet (it not YET in my testcases) but I'm > > > pretty confident it > > > should work. > > > > > > > Next, I install the OSR rpms directly in the running > > > > ocfs2-root guest, then shut it down. > > > > > > Ok. so far so good. > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > (that also has the OSR rpms installed) and follow > > > > the howto steps to create the cdsl infrastructure > > > > and links. Then I shut down the other guest. > > > > > > Could you recall the exact steps and outputs? > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > but it has problems. It appears that /var doesn't > > > > exist as I get many messages such as: > > > > > > The not mouting of /var is very strange. It should put you in > > > a rescue shell. > > > Then type messages and send me the output. > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory > > > > > > That's just before it trys to boot. That's somehow to far advanced. > > > > > > > and then the boot process seems to hang trying to start the > > > > System Logger. No, it just takes a very long time and > > > > eventually I get to a login prompt. (Or I can boot > > > > single-user mode and get the same error messages, but > > > > get to a bash prompt.) > > > > > > > > With a "ls -l /var", I see: > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > That's perfectly ok. > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > That's strange. > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > (with subdirectories 0 ... 7 and default) and each has > > > > an etc and a var subdirectory. /cluster/shared has > > > > a var subdirectory and a var/lib subdirectory. > > > > > > That's again perfectly ok.- > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > linked to /cluster but isn't. True? > > > > > > Right but it is not liked but bind mounted. That means: > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > but that's done in the initrd automatically so you should not > > > have to bother > > > about that. > > > > > > > One other thing I should mention... since my cluster.conf > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > command during the cdsl setup steps, I used cluster/cdsl/0 > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > How did you "use" that. That should be done automatically > > > shouldn't it? > > > > > > > Any ideas? Maybe your initrd creates some necessary links > > > > and mine does not? (I tried booting with your initrd, > > > > but my ocfs2-root failed to mount giving a kernel panic... > > > > have you tested with linux-2.6.29? The error message > > > > "Heartbeat has to be started to mount a read-write > > > > clustered device" looks like it comes from a somewhat > > > > recent ocfs2 kernel patch I found here: > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > and I worked around it by mounting with -o heartbeat=local) > > > > > > Sorry this is so long! > > > > How does your /etc/cluster/cluster.conf look like? > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > http://www.atix.de/ http://www.open-sharedroot.org/ > > --------------------------------------------------------------------------- >--- Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > _______________________________________________ > Open-sharedroot-users mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) | Vorsitzender des Aufsichtsrats: Dr. Martin Buss |
From: Marc G. <gr...@at...> - 2009-04-28 07:03:39
|
On Tuesday 28 April 2009 02:13:41 Dan Magenheimer wrote: > Hi Marc -- > > Thanks for the reply. First, let me clarify that I am > trying two different approaches: > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > a Xen-paravirtualized 2.6.29 kernel running as a > guest on Xen-3.4.0), or > (B) Build my own initrd and use your OSR scripts > only after my initrd finishes. > > I couldn't get (A) to work... the initrd built using > the howto was failing very early, so I decided to > try with (B). I hoped that (B) would be easier because > your code is very general to handle many different > kinds of systems, and mine could be much more specific. > > HOWEVER, I just discovered one problem with (A). > Your mkinitrd process builds a huge (200MB) initrd > and there appears to be a BUG IN XEN that fails > to load large initrds (larger than about 100MB)! > > Your initrd is so large because my lib/modules/2.6.29 > is very large. If I delete that from initrd built > using the howto, the initrd.gz is only about 24MB > and I am able to boot and see the ATIX logo and > it drops into the rescue shell (because I haven't > specified the MAC-Addresses I think). However other > boot errors complain about modules that are missing. > I will try to build a kernel with fewer modules and > see how that goes. You could also call mkinitrd with the -l/-L option. This will build a minimal initrd also only from loaded or specified modules. See mkinitrd -h. > > Other feedback: > > I wonder if your script detecting "distribution" > and "shortdistribution" are correct for all versions > of RHEL and Oracle Enterprise Linux? I am getting > "unknown" for both (from listparameters in the rescue > shell), though I am booting Oracle Enterprise Linux 5 > update 2. I didn't test it for Oracle EL5 and I would bet it does not work. Could you send me the contents of /etc/redhat-release or is it /etc/oracle-release? Then I build a patch to also support OEL5. > > I also see that your test for detecting xen in xen-lib.sh > is not very good. You may want to test for /proc/xen > instead of (or in addition to) /etc/xen. /etc/xen should only be tested for Dom0 "NOT" DomU. DomU detection => xen_domx_detect Dom0 detection => xen_dom0_detect in etc/xen-lib.sh > > And why when "Loading modules for all found network cards" > do I get "FATAL: Module xennet not found"? I have xen > networking compiled into my kernel so there is no module > for it. Should this be fatal? Hm. Up to now it seems to be ;-) . Nowerdays I only saw kernels which are modularized. So this is a usecase where the errordetection detects a nonexistant error. I'll have to think about it. As a workaround you could add a xennet off to /etc/modprobe.conf. At least I would suppose it to work. > > > Ok. But still we need the mac adress for detecting the nodes > > identity in the /etc/cluster/cluster.conf. > > Is that really necessary? Ocfs2 only requires the > node name, not the mac address. But I guess I can > configure the mac address in the xen guest config file > so I can live with this. Yes this still is necessary. We are going to make it more flexible in future versions. Up to now that's the only way. Sorry about this. > > Thanks, > Dan You're welcome Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Dan M. <dan...@or...> - 2009-04-28 00:52:21
|
Attached is the output from the messages command to the rescueshell for a freshly created initrd (with a smaller lib/modules). Some messages I see on the console which do not appear in the "messages" output: Detecting Hardware ./etc/hardware-lib.sh: line 458: unknown_hardware_detect: command not found Loading modules for all found network cardsFATAL: Module xennet not found. error: "xen.independent_wallclock" is an unknown key > -----Original Message----- > From: Dan Magenheimer > Sent: Monday, April 27, 2009 6:14 PM > To: Marc Grimme; ope...@li... > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > Hi Marc -- > > Thanks for the reply. First, let me clarify that I am > trying two different approaches: > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > a Xen-paravirtualized 2.6.29 kernel running as a > guest on Xen-3.4.0), or > (B) Build my own initrd and use your OSR scripts > only after my initrd finishes. > > I couldn't get (A) to work... the initrd built using > the howto was failing very early, so I decided to > try with (B). I hoped that (B) would be easier because > your code is very general to handle many different > kinds of systems, and mine could be much more specific. > > HOWEVER, I just discovered one problem with (A). > Your mkinitrd process builds a huge (200MB) initrd > and there appears to be a BUG IN XEN that fails > to load large initrds (larger than about 100MB)! > > Your initrd is so large because my lib/modules/2.6.29 > is very large. If I delete that from initrd built > using the howto, the initrd.gz is only about 24MB > and I am able to boot and see the ATIX logo and > it drops into the rescue shell (because I haven't > specified the MAC-Addresses I think). However other > boot errors complain about modules that are missing. > I will try to build a kernel with fewer modules and > see how that goes. > > Other feedback: > > I wonder if your script detecting "distribution" > and "shortdistribution" are correct for all versions > of RHEL and Oracle Enterprise Linux? I am getting > "unknown" for both (from listparameters in the rescue > shell), though I am booting Oracle Enterprise Linux 5 > update 2. > > I also see that your test for detecting xen in xen-lib.sh > is not very good. You may want to test for /proc/xen > instead of (or in addition to) /etc/xen. > > And why when "Loading modules for all found network cards" > do I get "FATAL: Module xennet not found"? I have xen > networking compiled into my kernel so there is no module > for it. Should this be fatal? > > > Ok. But still we need the mac adress for detecting the nodes > > identity in the /etc/cluster/cluster.conf. > > Is that really necessary? Ocfs2 only requires the > node name, not the mac address. But I guess I can > configure the mac address in the xen guest config file > so I can live with this. > > Thanks, > Dan > > > -----Original Message----- > > From: Marc Grimme [mailto:gr...@at...] > > Sent: Monday, April 27, 2009 12:52 AM > > To: ope...@li... > > Cc: Dan Magenheimer > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > Hi Marc -- > > > > > > Thanks for the help. I got past my rpm problems and am > > > now much further along but have hit another roadblock. > > > > > > First, FYI, I am using a different approach then the > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > way that I want to use the shared root. Specifically, > > > I am first building and booting a root ocfs2 filesystem, > > > using my own kernel (2.6.29) and my own initrd.img. With > > > this (and before I install any OSR stuff), I am able > > > to boot it as a Xen paravirtualized guest, using > > > the Xen kernel= and ramdisk= config options; the root > > > disk is NOT an LVM because I don't need a /boot. The > > No problem with this my osr-ocfs2 cluster is running exactly > > the same. No LVM, > > direct boot via kernel and initrd. That should not be a problem. > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > IP address and hostname from the Xen config file. > > > This all seems to work fine (for a single node). > > Ok. But still we need the mac adress for detecting the nodes > > identity in > > the /etc/cluster/cluster.conf. > > Also you might try to set the onboot flag in the cluster.conf > > at the nic > > config to "no": > > <com_info> > > .. > > <eth name="eth0" mac="..." onboot="no"/> > > .. > > </com_info> > > I didn't test it yet (it not YET in my testcases) but I'm > > pretty confident it > > should work. > > > > > > Next, I install the OSR rpms directly in the running > > > ocfs2-root guest, then shut it down. > > Ok. so far so good. > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > (that also has the OSR rpms installed) and follow > > > the howto steps to create the cdsl infrastructure > > > and links. Then I shut down the other guest. > > Could you recall the exact steps and outputs? > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > but it has problems. It appears that /var doesn't > > > exist as I get many messages such as: > > The not mouting of /var is very strange. It should put you in > > a rescue shell. > > Then type messages and send me the output. > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory > > That's just before it trys to boot. That's somehow to far advanced. > > > > > > and then the boot process seems to hang trying to start the > > > System Logger. No, it just takes a very long time and > > > eventually I get to a login prompt. (Or I can boot > > > single-user mode and get the same error messages, but > > > get to a bash prompt.) > > > > > > With a "ls -l /var", I see: > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > That's perfectly ok. > > > > > > (Note no leading / before cdsl.local) > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > That's strange. > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > (with subdirectories 0 ... 7 and default) and each has > > > an etc and a var subdirectory. /cluster/shared has > > > a var subdirectory and a var/lib subdirectory. > > That's again perfectly ok.- > > > > > > So I'm guessing that cdsl.local should somehow be > > > linked to /cluster but isn't. True? > > Right but it is not liked but bind mounted. That means: > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > but that's done in the initrd automatically so you should not > > have to bother > > about that. > > > > > > One other thing I should mention... since my cluster.conf > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > command during the cdsl setup steps, I used cluster/cdsl/0 > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > How did you "use" that. That should be done automatically > > shouldn't it? > > > > > > Any ideas? Maybe your initrd creates some necessary links > > > and mine does not? (I tried booting with your initrd, > > > but my ocfs2-root failed to mount giving a kernel panic... > > > have you tested with linux-2.6.29? The error message > > > "Heartbeat has to be started to mount a read-write > > > clustered device" looks like it comes from a somewhat > > > recent ocfs2 kernel patch I found here: > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > and I worked around it by mounting with -o heartbeat=local) > > > > Sorry this is so long! > How does your /etc/cluster/cluster.conf look like? > > -- > Gruss / Regards, > > Marc Grimme > http://www.atix.de/ http://www.open-sharedroot.org/ > > ------------------------------------------------------------------------------ Register Now & Save for Velocity, the Web Performance & Operations Conference from O'Reilly Media. Velocity features a full day of expert-led, hands-on workshops and two days of sessions from industry leaders in dedicated Performance & Operations tracks. Use code vel09scf and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf _______________________________________________ Open-sharedroot-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users |
From: Dan M. <dan...@or...> - 2009-04-28 00:14:13
|
Hi Marc -- Thanks for the reply. First, let me clarify that I am trying two different approaches: (A) Use your entire OSR RHEL5+OCFS2 howto (with a Xen-paravirtualized 2.6.29 kernel running as a guest on Xen-3.4.0), or (B) Build my own initrd and use your OSR scripts only after my initrd finishes. I couldn't get (A) to work... the initrd built using the howto was failing very early, so I decided to try with (B). I hoped that (B) would be easier because your code is very general to handle many different kinds of systems, and mine could be much more specific. HOWEVER, I just discovered one problem with (A). Your mkinitrd process builds a huge (200MB) initrd and there appears to be a BUG IN XEN that fails to load large initrds (larger than about 100MB)! Your initrd is so large because my lib/modules/2.6.29 is very large. If I delete that from initrd built using the howto, the initrd.gz is only about 24MB and I am able to boot and see the ATIX logo and it drops into the rescue shell (because I haven't specified the MAC-Addresses I think). However other boot errors complain about modules that are missing. I will try to build a kernel with fewer modules and see how that goes. Other feedback: I wonder if your script detecting "distribution" and "shortdistribution" are correct for all versions of RHEL and Oracle Enterprise Linux? I am getting "unknown" for both (from listparameters in the rescue shell), though I am booting Oracle Enterprise Linux 5 update 2. I also see that your test for detecting xen in xen-lib.sh is not very good. You may want to test for /proc/xen instead of (or in addition to) /etc/xen. And why when "Loading modules for all found network cards" do I get "FATAL: Module xennet not found"? I have xen networking compiled into my kernel so there is no module for it. Should this be fatal? > Ok. But still we need the mac adress for detecting the nodes > identity in the /etc/cluster/cluster.conf. Is that really necessary? Ocfs2 only requires the node name, not the mac address. But I guess I can configure the mac address in the xen guest config file so I can live with this. Thanks, Dan > -----Original Message----- > From: Marc Grimme [mailto:gr...@at...] > Sent: Monday, April 27, 2009 12:52 AM > To: ope...@li... > Cc: Dan Magenheimer > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > Hi Marc -- > > > > Thanks for the help. I got past my rpm problems and am > > now much further along but have hit another roadblock. > > > > First, FYI, I am using a different approach then the > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > way that I want to use the shared root. Specifically, > > I am first building and booting a root ocfs2 filesystem, > > using my own kernel (2.6.29) and my own initrd.img. With > > this (and before I install any OSR stuff), I am able > > to boot it as a Xen paravirtualized guest, using > > the Xen kernel= and ramdisk= config options; the root > > disk is NOT an LVM because I don't need a /boot. The > No problem with this my osr-ocfs2 cluster is running exactly > the same. No LVM, > direct boot via kernel and initrd. That should not be a problem. > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > IP address and hostname from the Xen config file. > > This all seems to work fine (for a single node). > Ok. But still we need the mac adress for detecting the nodes > identity in > the /etc/cluster/cluster.conf. > Also you might try to set the onboot flag in the cluster.conf > at the nic > config to "no": > <com_info> > .. > <eth name="eth0" mac="..." onboot="no"/> > .. > </com_info> > I didn't test it yet (it not YET in my testcases) but I'm > pretty confident it > should work. > > > > Next, I install the OSR rpms directly in the running > > ocfs2-root guest, then shut it down. > Ok. so far so good. > > > > Next, I mount the ocfs2-root-disk from another guest > > (that also has the OSR rpms installed) and follow > > the howto steps to create the cdsl infrastructure > > and links. Then I shut down the other guest. > Could you recall the exact steps and outputs? > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > but it has problems. It appears that /var doesn't > > exist as I get many messages such as: > The not mouting of /var is very strange. It should put you in > a rescue shell. > Then type messages and send me the output. > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory > That's just before it trys to boot. That's somehow to far advanced. > > > > and then the boot process seems to hang trying to start the > > System Logger. No, it just takes a very long time and > > eventually I get to a login prompt. (Or I can boot > > single-user mode and get the same error messages, but > > get to a bash prompt.) > > > > With a "ls -l /var", I see: > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > That's perfectly ok. > > > > (Note no leading / before cdsl.local) > > > > but "ls -ld /cdsl.local" shows it is empty. > That's strange. > > > > Browsing around, I see that /cluster/cdsl is populated > > (with subdirectories 0 ... 7 and default) and each has > > an etc and a var subdirectory. /cluster/shared has > > a var subdirectory and a var/lib subdirectory. > That's again perfectly ok.- > > > > So I'm guessing that cdsl.local should somehow be > > linked to /cluster but isn't. True? > Right but it is not liked but bind mounted. That means: > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > but that's done in the initrd automatically so you should not > have to bother > about that. > > > > One other thing I should mention... since my cluster.conf > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > command during the cdsl setup steps, I used cluster/cdsl/0 > > instead of cluster/cdsl/1 to bind to cdsl.local. > How did you "use" that. That should be done automatically > shouldn't it? > > > > Any ideas? Maybe your initrd creates some necessary links > > and mine does not? (I tried booting with your initrd, > > but my ocfs2-root failed to mount giving a kernel panic... > > have you tested with linux-2.6.29? The error message > > "Heartbeat has to be started to mount a read-write > > clustered device" looks like it comes from a somewhat > > recent ocfs2 kernel patch I found here: > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > and I worked around it by mounting with -o heartbeat=local) > > > > Sorry this is so long! > How does your /etc/cluster/cluster.conf look like? > > -- > Gruss / Regards, > > Marc Grimme > http://www.atix.de/ http://www.open-sharedroot.org/ > > |
From: Marc G. <gr...@at...> - 2009-04-27 06:54:23
|
On Friday 24 April 2009 01:20:15 Dan Magenheimer wrote: > A couple more items before I quit for the day: > > I manually hacked in a symbolic link for /var so > the boot would work better and turned on -x in > the bootsr script. > > I am also seeing a couple of messages of the type: > > Error: cannot find /var/comoonics/chrootpath > > and I see that message comes from manage_chroot.sh Hmm, again strange. That must be existant. Try to mkdir /var/comoonics. Then reboot. Most important is to see your output of the bootprocess. I think there is the source of your problems to be found. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Marc G. <gr...@at...> - 2009-04-27 06:52:34
|
On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > Hi Marc -- > > Thanks for the help. I got past my rpm problems and am > now much further along but have hit another roadblock. > > First, FYI, I am using a different approach then the > RHEL5 OCFS2 Shared Root Mini Howto, because of the > way that I want to use the shared root. Specifically, > I am first building and booting a root ocfs2 filesystem, > using my own kernel (2.6.29) and my own initrd.img. With > this (and before I install any OSR stuff), I am able > to boot it as a Xen paravirtualized guest, using > the Xen kernel= and ramdisk= config options; the root > disk is NOT an LVM because I don't need a /boot. The No problem with this my osr-ocfs2 cluster is running exactly the same. No LVM, direct boot via kernel and initrd. That should not be a problem. > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > IP address and hostname from the Xen config file. > This all seems to work fine (for a single node). Ok. But still we need the mac adress for detecting the nodes identity in the /etc/cluster/cluster.conf. Also you might try to set the onboot flag in the cluster.conf at the nic config to "no": <com_info> .. <eth name="eth0" mac="..." onboot="no"/> .. </com_info> I didn't test it yet (it not YET in my testcases) but I'm pretty confident it should work. > > Next, I install the OSR rpms directly in the running > ocfs2-root guest, then shut it down. Ok. so far so good. > > Next, I mount the ocfs2-root-disk from another guest > (that also has the OSR rpms installed) and follow > the howto steps to create the cdsl infrastructure > and links. Then I shut down the other guest. Could you recall the exact steps and outputs? > > Next, I try to boot the OSR-modified ocfs2-root guest, > but it has problems. It appears that /var doesn't > exist as I get many messages such as: The not mouting of /var is very strange. It should put you in a rescue shell. Then type messages and send me the output. > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory That's just before it trys to boot. That's somehow to far advanced. > > and then the boot process seems to hang trying to start the > System Logger. No, it just takes a very long time and > eventually I get to a login prompt. (Or I can boot > single-user mode and get the same error messages, but > get to a bash prompt.) > > With a "ls -l /var", I see: > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var That's perfectly ok. > > (Note no leading / before cdsl.local) > > but "ls -ld /cdsl.local" shows it is empty. That's strange. > > Browsing around, I see that /cluster/cdsl is populated > (with subdirectories 0 ... 7 and default) and each has > an etc and a var subdirectory. /cluster/shared has > a var subdirectory and a var/lib subdirectory. That's again perfectly ok.- > > So I'm guessing that cdsl.local should somehow be > linked to /cluster but isn't. True? Right but it is not liked but bind mounted. That means: mount --bind /cluster/cdsl/<nodeid> /cdsl.local but that's done in the initrd automatically so you should not have to bother about that. > > One other thing I should mention... since my cluster.conf > has 8 nodes numbered 0 to 7, in the "mount --bind" > command during the cdsl setup steps, I used cluster/cdsl/0 > instead of cluster/cdsl/1 to bind to cdsl.local. How did you "use" that. That should be done automatically shouldn't it? > > Any ideas? Maybe your initrd creates some necessary links > and mine does not? (I tried booting with your initrd, > but my ocfs2-root failed to mount giving a kernel panic... > have you tested with linux-2.6.29? The error message > "Heartbeat has to be started to mount a read-write > clustered device" looks like it comes from a somewhat > recent ocfs2 kernel patch I found here: > http://www.mail-archive.com/ocf...@os.../msg00293.html > and I worked around it by mounting with -o heartbeat=local) > > Sorry this is so long! How does your /etc/cluster/cluster.conf look like? -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Dan M. <dan...@or...> - 2009-04-24 18:06:01
|
I don't know if this is related, but it seems my distribution (EL5u2) is getting misclassified by getDistributionList as unknown and my clutype and cluster_conf also: __repository__comoonics_bootimage__cdsl_local_dir="/cdsl.local" __repository__comoonics_bootimage__cdsl_prefix="/cluster/cdsl" __repository__comoonics_bootimage__distribution="unknown" __repository__comoonics_bootimage__shortdistribution="unknown" __repository__comoonics_bootimage__clutype="gfs" __repository__comoonics_bootimage__cluster_conf="/etc/cluster/cluster.conf" __repository__comoonics_bootimage__rootfs="ocfs2" __repository__comoonics_bootimage__chroot="0" __repository__comoonics_bootimage__cluster_conf="/etc/cluster/cluster.conf" __repository__comoonics_bootimage__hardwareids="eth0:00:16:3E:24:6C:79 eth1:00:16:3E:74:DC:50" > -----Original Message----- > From: Dan Magenheimer > Sent: Thursday, April 23, 2009 5:20 PM > To: Dan Magenheimer; Marc Grimme > Cc: ope...@li... > Subject: RE: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > A couple more items before I quit for the day: > > I manually hacked in a symbolic link for /var so > the boot would work better and turned on -x in > the bootsr script. > > I am also seeing a couple of messages of the type: > > Error: cannot find /var/comoonics/chrootpath > > and I see that message comes from manage_chroot.sh > > > -----Original Message----- > > From: Dan Magenheimer > > Sent: Thursday, April 23, 2009 4:42 PM > > To: Marc Grimme > > Cc: ope...@li... > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > Hi Marc -- > > > > Thanks for the help. I got past my rpm problems and am > > now much further along but have hit another roadblock. > > > > First, FYI, I am using a different approach then the > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > way that I want to use the shared root. Specifically, > > I am first building and booting a root ocfs2 filesystem, > > using my own kernel (2.6.29) and my own initrd.img. With > > this (and before I install any OSR stuff), I am able > > to boot it as a Xen paravirtualized guest, using > > the Xen kernel= and ramdisk= config options; the root > > disk is NOT an LVM because I don't need a /boot. The > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > IP address and hostname from the Xen config file. > > This all seems to work fine (for a single node). > > > > Next, I install the OSR rpms directly in the running > > ocfs2-root guest, then shut it down. > > > > Next, I mount the ocfs2-root-disk from another guest > > (that also has the OSR rpms installed) and follow > > the howto steps to create the cdsl infrastructure > > and links. Then I shut down the other guest. > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > but it has problems. It appears that /var doesn't > > exist as I get many messages such as: > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory > > > > and then the boot process seems to hang trying to start the > > System Logger. No, it just takes a very long time and > > eventually I get to a login prompt. (Or I can boot > > single-user mode and get the same error messages, but > > get to a bash prompt.) > > > > With a "ls -l /var", I see: > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > (Note no leading / before cdsl.local) > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > Browsing around, I see that /cluster/cdsl is populated > > (with subdirectories 0 ... 7 and default) and each has > > an etc and a var subdirectory. /cluster/shared has > > a var subdirectory and a var/lib subdirectory. > > > > So I'm guessing that cdsl.local should somehow be > > linked to /cluster but isn't. True? > > > > One other thing I should mention... since my cluster.conf > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > command during the cdsl setup steps, I used cluster/cdsl/0 > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > Any ideas? Maybe your initrd creates some necessary links > > and mine does not? (I tried booting with your initrd, > > but my ocfs2-root failed to mount giving a kernel panic... > > have you tested with linux-2.6.29? The error message > > "Heartbeat has to be started to mount a read-write > > clustered device" looks like it comes from a somewhat > > recent ocfs2 kernel patch I found here: > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > and I worked around it by mounting with -o heartbeat=local) > > > > Sorry this is so long! > > > > Thanks, > > Dan > > > > > -----Original Message----- > > > From: Marc Grimme [mailto:gr...@at...] > > > Sent: Wednesday, April 22, 2009 12:39 AM > > > To: Dan Magenheimer > > > Cc: ope...@li... > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > Candidate of > > > comoonics-bootimage > > > > > > > > > Hi Dan, > > > On Wednesday 22 April 2009 02:20:15 Dan Magenheimer wrote: > > > > Hi Marc -- > > > > > > > > cc'ing the list this time... > > > > > > > > I want to get EL5 OCFS2 shared-root working with > > > > the latest preview rpms. > > > > > > > > Here's what I downloaded from download.atix.de: > > > > > > > > * ocfs2-tools-debuginfo-1.4.1-1.el5 > > > > * ocfs2-tools-1.4.1-1.el5 > > > > * ocfs2-tools-devel-1.4.1-1.el5 > > > > * ocfs2console-1.4.1-1.el5 > > > > * comoonics-pythonosfix-py-0.1-2 > > > > * comoonics-bootimage-listfiles-1.3-8.el5 > > > > * SysVinit-comoonics-2.86-14.atix.1.i386 > > > > * comoonics-cluster-py-0.1-17 > > > > * comoonics-cdsl-py-0.2-11 > > > > * comoonics-bootimage-1.4-19 > > > > * comoonics-cs-py-0.1-56 > > > > * comoonics-bootimage-initscripts-1.4.9.rhel5 > > > > * comoonics-bootimage-extras-ocfs2-0.1-3 > > > > * comoonics-bootimage-extras-xen-0.1-5 > > > > > > > > I did not find any version of this that was mentioned > > > > in the how-to: > > > > > > > > * comoonics-release-0.1-1 > > > This one isn't there yet ;-) . > > > > > > > > and I couldn't find an el5 version of this, just rhel5: > > > > > > > > * comoonics-bootimage-initscripts-1.4.9.rhel5 > > > That's perfectly ok. > > > You just forgot the comoonics-bootimage-listfiles-rhel, > > > comoonics-bootimage-listfiles-rhel5. > > > > > > Here is a list of rpms of my ocfs2cluster: > > > [root@ocfs2-node2 ~]# rpm -qa 'comoonics-*' > > > comoonics-cluster-py-0.1-17 > > > comoonics-bootimage-listfiles-rhel-0.1-3 > > > comoonics-bootimage-listfiles-rhel5-0.1-3 > > > comoonics-bootimage-listfiles-all-0.1-5 > > > comoonics-cdsl-py-0.2-12 > > > comoonics-bootimage-extras-ocfs2-0.1-3 > > > comoonics-pythonosfix-py-0.1-2 > > > comoonics-bootimage-initscripts-1.4-10.rhel5 > > > comoonics-cs-py-0.1-56 > > > comoonics-bootimage-listfiles-1.3-8.rhel5 > > > comoonics-bootimage-extras-xen-0.1-5 > > > comoonics-bootimage-1.4-20 > > > > > > > > > > > and installing the rpms above, I got failed dependencies > > > > for: > > > > > > > > comoonics-bootimage-initscripts-rhel > > > Could you recall the exact output of rpm? > > > > > > > > and > > > > > > > > comoonics-bootimage-initscripts-rhel5 > > > > > > > > and couldn't find these in download.atix.de > > > Right it is comoonics-bootimage-initscripts-1.4-10rhel5 > > > > > > > > Last, I was interested in looking at your > > > > > > > > /opt/atix/comoonics-bootimage/mkinitrd > > > Hm. Again my cluster shows the following > > > [root@ocfs2-node2 ~]# ll /opt/atix/comoonics-bootimage/ > > > total 48 > > > drwxr-xr-x 7 root root 4096 Apr 21 21:48 boot-scripts > > > -rwxr-xr-x 1 root root 15338 Apr 21 09:18 > > create-gfs-initrd-generic.sh > > > -rw-r--r-- 1 root root 4308 Apr 21 09:18 create-gfs-initrd-lib.sh > > > -rwxr-xr-x 1 root root 12360 Apr 21 09:18 manage_chroot.sh > > > lrwxrwxrwx 1 root root 58 Apr 21 21:48 > > > mkinitrd -> > > /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > > > drwxr-xr-x 2 root root 4096 Apr 21 21:48 patches > > > [root@ocfs2-node2 ~]# > > > rpm -qf > /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > > > comoonics-bootimage-1.4-20 > > > > > > There should be a symlink > > > from /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > > > => /opt/atix/comoonics-bootimage/mkinitrd > > > > > > > > > > > script but it doesn't seem to be contained in any > > > > of the rpms. > > > > > > > > Help appreciated! > > > > > > > Marc. > > > > > > -- > > > Gruss / Regards, > > > > > > Marc Grimme > > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > > > > > > > -------------------------------------------------------------- > > ---------------- > > Crystal Reports - New Free Runtime and 30 Day Trial > > Check out the new simplified licensign option that enables unlimited > > royalty-free distribution of the report engine for > > externally facing > > server and web deployment. > > http://p.sf.net/sfu/businessobjects > > _______________________________________________ > > Open-sharedroot-users mailing list > > Ope...@li... > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > |
From: Dan M. <dan...@or...> - 2009-04-23 23:20:38
|
A couple more items before I quit for the day: I manually hacked in a symbolic link for /var so the boot would work better and turned on -x in the bootsr script. I am also seeing a couple of messages of the type: Error: cannot find /var/comoonics/chrootpath and I see that message comes from manage_chroot.sh > -----Original Message----- > From: Dan Magenheimer > Sent: Thursday, April 23, 2009 4:42 PM > To: Marc Grimme > Cc: ope...@li... > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > Hi Marc -- > > Thanks for the help. I got past my rpm problems and am > now much further along but have hit another roadblock. > > First, FYI, I am using a different approach then the > RHEL5 OCFS2 Shared Root Mini Howto, because of the > way that I want to use the shared root. Specifically, > I am first building and booting a root ocfs2 filesystem, > using my own kernel (2.6.29) and my own initrd.img. With > this (and before I install any OSR stuff), I am able > to boot it as a Xen paravirtualized guest, using > the Xen kernel= and ramdisk= config options; the root > disk is NOT an LVM because I don't need a /boot. The > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > IP address and hostname from the Xen config file. > This all seems to work fine (for a single node). > > Next, I install the OSR rpms directly in the running > ocfs2-root guest, then shut it down. > > Next, I mount the ocfs2-root-disk from another guest > (that also has the OSR rpms installed) and follow > the howto steps to create the cdsl infrastructure > and links. Then I shut down the other guest. > > Next, I try to boot the OSR-modified ocfs2-root guest, > but it has problems. It appears that /var doesn't > exist as I get many messages such as: > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory > > and then the boot process seems to hang trying to start the > System Logger. No, it just takes a very long time and > eventually I get to a login prompt. (Or I can boot > single-user mode and get the same error messages, but > get to a bash prompt.) > > With a "ls -l /var", I see: > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > (Note no leading / before cdsl.local) > > but "ls -ld /cdsl.local" shows it is empty. > > Browsing around, I see that /cluster/cdsl is populated > (with subdirectories 0 ... 7 and default) and each has > an etc and a var subdirectory. /cluster/shared has > a var subdirectory and a var/lib subdirectory. > > So I'm guessing that cdsl.local should somehow be > linked to /cluster but isn't. True? > > One other thing I should mention... since my cluster.conf > has 8 nodes numbered 0 to 7, in the "mount --bind" > command during the cdsl setup steps, I used cluster/cdsl/0 > instead of cluster/cdsl/1 to bind to cdsl.local. > > Any ideas? Maybe your initrd creates some necessary links > and mine does not? (I tried booting with your initrd, > but my ocfs2-root failed to mount giving a kernel panic... > have you tested with linux-2.6.29? The error message > "Heartbeat has to be started to mount a read-write > clustered device" looks like it comes from a somewhat > recent ocfs2 kernel patch I found here: > http://www.mail-archive.com/ocf...@os.../msg00293.html > and I worked around it by mounting with -o heartbeat=local) > > Sorry this is so long! > > Thanks, > Dan > > > -----Original Message----- > > From: Marc Grimme [mailto:gr...@at...] > > Sent: Wednesday, April 22, 2009 12:39 AM > > To: Dan Magenheimer > > Cc: ope...@li... > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > Hi Dan, > > On Wednesday 22 April 2009 02:20:15 Dan Magenheimer wrote: > > > Hi Marc -- > > > > > > cc'ing the list this time... > > > > > > I want to get EL5 OCFS2 shared-root working with > > > the latest preview rpms. > > > > > > Here's what I downloaded from download.atix.de: > > > > > > * ocfs2-tools-debuginfo-1.4.1-1.el5 > > > * ocfs2-tools-1.4.1-1.el5 > > > * ocfs2-tools-devel-1.4.1-1.el5 > > > * ocfs2console-1.4.1-1.el5 > > > * comoonics-pythonosfix-py-0.1-2 > > > * comoonics-bootimage-listfiles-1.3-8.el5 > > > * SysVinit-comoonics-2.86-14.atix.1.i386 > > > * comoonics-cluster-py-0.1-17 > > > * comoonics-cdsl-py-0.2-11 > > > * comoonics-bootimage-1.4-19 > > > * comoonics-cs-py-0.1-56 > > > * comoonics-bootimage-initscripts-1.4.9.rhel5 > > > * comoonics-bootimage-extras-ocfs2-0.1-3 > > > * comoonics-bootimage-extras-xen-0.1-5 > > > > > > I did not find any version of this that was mentioned > > > in the how-to: > > > > > > * comoonics-release-0.1-1 > > This one isn't there yet ;-) . > > > > > > and I couldn't find an el5 version of this, just rhel5: > > > > > > * comoonics-bootimage-initscripts-1.4.9.rhel5 > > That's perfectly ok. > > You just forgot the comoonics-bootimage-listfiles-rhel, > > comoonics-bootimage-listfiles-rhel5. > > > > Here is a list of rpms of my ocfs2cluster: > > [root@ocfs2-node2 ~]# rpm -qa 'comoonics-*' > > comoonics-cluster-py-0.1-17 > > comoonics-bootimage-listfiles-rhel-0.1-3 > > comoonics-bootimage-listfiles-rhel5-0.1-3 > > comoonics-bootimage-listfiles-all-0.1-5 > > comoonics-cdsl-py-0.2-12 > > comoonics-bootimage-extras-ocfs2-0.1-3 > > comoonics-pythonosfix-py-0.1-2 > > comoonics-bootimage-initscripts-1.4-10.rhel5 > > comoonics-cs-py-0.1-56 > > comoonics-bootimage-listfiles-1.3-8.rhel5 > > comoonics-bootimage-extras-xen-0.1-5 > > comoonics-bootimage-1.4-20 > > > > > > > > and installing the rpms above, I got failed dependencies > > > for: > > > > > > comoonics-bootimage-initscripts-rhel > > Could you recall the exact output of rpm? > > > > > > and > > > > > > comoonics-bootimage-initscripts-rhel5 > > > > > > and couldn't find these in download.atix.de > > Right it is comoonics-bootimage-initscripts-1.4-10rhel5 > > > > > > Last, I was interested in looking at your > > > > > > /opt/atix/comoonics-bootimage/mkinitrd > > Hm. Again my cluster shows the following > > [root@ocfs2-node2 ~]# ll /opt/atix/comoonics-bootimage/ > > total 48 > > drwxr-xr-x 7 root root 4096 Apr 21 21:48 boot-scripts > > -rwxr-xr-x 1 root root 15338 Apr 21 09:18 > create-gfs-initrd-generic.sh > > -rw-r--r-- 1 root root 4308 Apr 21 09:18 create-gfs-initrd-lib.sh > > -rwxr-xr-x 1 root root 12360 Apr 21 09:18 manage_chroot.sh > > lrwxrwxrwx 1 root root 58 Apr 21 21:48 > > mkinitrd -> > /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > > drwxr-xr-x 2 root root 4096 Apr 21 21:48 patches > > [root@ocfs2-node2 ~]# > > rpm -qf /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > > comoonics-bootimage-1.4-20 > > > > There should be a symlink > > from /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > > => /opt/atix/comoonics-bootimage/mkinitrd > > > > > > > > script but it doesn't seem to be contained in any > > > of the rpms. > > > > > > Help appreciated! > > > > > Marc. > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > > > -------------------------------------------------------------- > ---------------- > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensign option that enables unlimited > royalty-free distribution of the report engine for > externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Open-sharedroot-users mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > |
From: Dan M. <dan...@or...> - 2009-04-23 22:42:41
|
Hi Marc -- Thanks for the help. I got past my rpm problems and am now much further along but have hit another roadblock. First, FYI, I am using a different approach then the RHEL5 OCFS2 Shared Root Mini Howto, because of the way that I want to use the shared root. Specifically, I am first building and booting a root ocfs2 filesystem, using my own kernel (2.6.29) and my own initrd.img. With this (and before I install any OSR stuff), I am able to boot it as a Xen paravirtualized guest, using the Xen kernel= and ramdisk= config options; the root disk is NOT an LVM because I don't need a /boot. The 2.6.29 kernel has CONFIG_IP_PNP and I pass in the IP address and hostname from the Xen config file. This all seems to work fine (for a single node). Next, I install the OSR rpms directly in the running ocfs2-root guest, then shut it down. Next, I mount the ocfs2-root-disk from another guest (that also has the OSR rpms installed) and follow the howto steps to create the cdsl infrastructure and links. Then I shut down the other guest. Next, I try to boot the OSR-modified ocfs2-root guest, but it has problems. It appears that /var doesn't exist as I get many messages such as: /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory and then the boot process seems to hang trying to start the System Logger. No, it just takes a very long time and eventually I get to a login prompt. (Or I can boot single-user mode and get the same error messages, but get to a bash prompt.) With a "ls -l /var", I see: lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var (Note no leading / before cdsl.local) but "ls -ld /cdsl.local" shows it is empty. Browsing around, I see that /cluster/cdsl is populated (with subdirectories 0 ... 7 and default) and each has an etc and a var subdirectory. /cluster/shared has a var subdirectory and a var/lib subdirectory. So I'm guessing that cdsl.local should somehow be linked to /cluster but isn't. True? One other thing I should mention... since my cluster.conf has 8 nodes numbered 0 to 7, in the "mount --bind" command during the cdsl setup steps, I used cluster/cdsl/0 instead of cluster/cdsl/1 to bind to cdsl.local. Any ideas? Maybe your initrd creates some necessary links and mine does not? (I tried booting with your initrd, but my ocfs2-root failed to mount giving a kernel panic... have you tested with linux-2.6.29? The error message "Heartbeat has to be started to mount a read-write clustered device" looks like it comes from a somewhat recent ocfs2 kernel patch I found here: http://www.mail-archive.com/ocf...@os.../msg00293.html and I worked around it by mounting with -o heartbeat=local) Sorry this is so long! Thanks, Dan > -----Original Message----- > From: Marc Grimme [mailto:gr...@at...] > Sent: Wednesday, April 22, 2009 12:39 AM > To: Dan Magenheimer > Cc: ope...@li... > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > Hi Dan, > On Wednesday 22 April 2009 02:20:15 Dan Magenheimer wrote: > > Hi Marc -- > > > > cc'ing the list this time... > > > > I want to get EL5 OCFS2 shared-root working with > > the latest preview rpms. > > > > Here's what I downloaded from download.atix.de: > > > > * ocfs2-tools-debuginfo-1.4.1-1.el5 > > * ocfs2-tools-1.4.1-1.el5 > > * ocfs2-tools-devel-1.4.1-1.el5 > > * ocfs2console-1.4.1-1.el5 > > * comoonics-pythonosfix-py-0.1-2 > > * comoonics-bootimage-listfiles-1.3-8.el5 > > * SysVinit-comoonics-2.86-14.atix.1.i386 > > * comoonics-cluster-py-0.1-17 > > * comoonics-cdsl-py-0.2-11 > > * comoonics-bootimage-1.4-19 > > * comoonics-cs-py-0.1-56 > > * comoonics-bootimage-initscripts-1.4.9.rhel5 > > * comoonics-bootimage-extras-ocfs2-0.1-3 > > * comoonics-bootimage-extras-xen-0.1-5 > > > > I did not find any version of this that was mentioned > > in the how-to: > > > > * comoonics-release-0.1-1 > This one isn't there yet ;-) . > > > > and I couldn't find an el5 version of this, just rhel5: > > > > * comoonics-bootimage-initscripts-1.4.9.rhel5 > That's perfectly ok. > You just forgot the comoonics-bootimage-listfiles-rhel, > comoonics-bootimage-listfiles-rhel5. > > Here is a list of rpms of my ocfs2cluster: > [root@ocfs2-node2 ~]# rpm -qa 'comoonics-*' > comoonics-cluster-py-0.1-17 > comoonics-bootimage-listfiles-rhel-0.1-3 > comoonics-bootimage-listfiles-rhel5-0.1-3 > comoonics-bootimage-listfiles-all-0.1-5 > comoonics-cdsl-py-0.2-12 > comoonics-bootimage-extras-ocfs2-0.1-3 > comoonics-pythonosfix-py-0.1-2 > comoonics-bootimage-initscripts-1.4-10.rhel5 > comoonics-cs-py-0.1-56 > comoonics-bootimage-listfiles-1.3-8.rhel5 > comoonics-bootimage-extras-xen-0.1-5 > comoonics-bootimage-1.4-20 > > > > > and installing the rpms above, I got failed dependencies > > for: > > > > comoonics-bootimage-initscripts-rhel > Could you recall the exact output of rpm? > > > > and > > > > comoonics-bootimage-initscripts-rhel5 > > > > and couldn't find these in download.atix.de > Right it is comoonics-bootimage-initscripts-1.4-10rhel5 > > > > Last, I was interested in looking at your > > > > /opt/atix/comoonics-bootimage/mkinitrd > Hm. Again my cluster shows the following > [root@ocfs2-node2 ~]# ll /opt/atix/comoonics-bootimage/ > total 48 > drwxr-xr-x 7 root root 4096 Apr 21 21:48 boot-scripts > -rwxr-xr-x 1 root root 15338 Apr 21 09:18 create-gfs-initrd-generic.sh > -rw-r--r-- 1 root root 4308 Apr 21 09:18 create-gfs-initrd-lib.sh > -rwxr-xr-x 1 root root 12360 Apr 21 09:18 manage_chroot.sh > lrwxrwxrwx 1 root root 58 Apr 21 21:48 > mkinitrd -> /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > drwxr-xr-x 2 root root 4096 Apr 21 21:48 patches > [root@ocfs2-node2 ~]# > rpm -qf /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > comoonics-bootimage-1.4-20 > > There should be a symlink > from /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh > => /opt/atix/comoonics-bootimage/mkinitrd > > > > > script but it doesn't seem to be contained in any > > of the rpms. > > > > Help appreciated! > > > Marc. > > -- > Gruss / Regards, > > Marc Grimme > http://www.atix.de/ http://www.open-sharedroot.org/ > > |
From: Klaus S. <kla...@Ph...> - 2009-04-23 08:32:31
|
Hello, > Last time I've seen it (and I've only seen it about two times) I saw loads of > cached objects in /proc/slabinfo. Currently I don't think it's related to > using osr but to using the cluster or xen itself. Now I've got a node with memory trouble. > Next time when you see something like this it would be great if you could > provide a sysrq-m/t and the output of /proc/slabinfo. The sysrq didn't work as none of the nodes responds to Port 12242, probably fenceackserver is not running (but is configured in cluster.conf). At least the node is halfways responsive, so I could do "cat /proc/slabinfo". Sincerly, Klaus |
From: Marc G. <gr...@at...> - 2009-04-22 06:39:05
|
Hi Dan, On Wednesday 22 April 2009 02:20:15 Dan Magenheimer wrote: > Hi Marc -- > > cc'ing the list this time... > > I want to get EL5 OCFS2 shared-root working with > the latest preview rpms. > > Here's what I downloaded from download.atix.de: > > * ocfs2-tools-debuginfo-1.4.1-1.el5 > * ocfs2-tools-1.4.1-1.el5 > * ocfs2-tools-devel-1.4.1-1.el5 > * ocfs2console-1.4.1-1.el5 > * comoonics-pythonosfix-py-0.1-2 > * comoonics-bootimage-listfiles-1.3-8.el5 > * SysVinit-comoonics-2.86-14.atix.1.i386 > * comoonics-cluster-py-0.1-17 > * comoonics-cdsl-py-0.2-11 > * comoonics-bootimage-1.4-19 > * comoonics-cs-py-0.1-56 > * comoonics-bootimage-initscripts-1.4.9.rhel5 > * comoonics-bootimage-extras-ocfs2-0.1-3 > * comoonics-bootimage-extras-xen-0.1-5 > > I did not find any version of this that was mentioned > in the how-to: > > * comoonics-release-0.1-1 This one isn't there yet ;-) . > > and I couldn't find an el5 version of this, just rhel5: > > * comoonics-bootimage-initscripts-1.4.9.rhel5 That's perfectly ok. You just forgot the comoonics-bootimage-listfiles-rhel, comoonics-bootimage-listfiles-rhel5. Here is a list of rpms of my ocfs2cluster: [root@ocfs2-node2 ~]# rpm -qa 'comoonics-*' comoonics-cluster-py-0.1-17 comoonics-bootimage-listfiles-rhel-0.1-3 comoonics-bootimage-listfiles-rhel5-0.1-3 comoonics-bootimage-listfiles-all-0.1-5 comoonics-cdsl-py-0.2-12 comoonics-bootimage-extras-ocfs2-0.1-3 comoonics-pythonosfix-py-0.1-2 comoonics-bootimage-initscripts-1.4-10.rhel5 comoonics-cs-py-0.1-56 comoonics-bootimage-listfiles-1.3-8.rhel5 comoonics-bootimage-extras-xen-0.1-5 comoonics-bootimage-1.4-20 > > and installing the rpms above, I got failed dependencies > for: > > comoonics-bootimage-initscripts-rhel Could you recall the exact output of rpm? > > and > > comoonics-bootimage-initscripts-rhel5 > > and couldn't find these in download.atix.de Right it is comoonics-bootimage-initscripts-1.4-10rhel5 > > Last, I was interested in looking at your > > /opt/atix/comoonics-bootimage/mkinitrd Hm. Again my cluster shows the following [root@ocfs2-node2 ~]# ll /opt/atix/comoonics-bootimage/ total 48 drwxr-xr-x 7 root root 4096 Apr 21 21:48 boot-scripts -rwxr-xr-x 1 root root 15338 Apr 21 09:18 create-gfs-initrd-generic.sh -rw-r--r-- 1 root root 4308 Apr 21 09:18 create-gfs-initrd-lib.sh -rwxr-xr-x 1 root root 12360 Apr 21 09:18 manage_chroot.sh lrwxrwxrwx 1 root root 58 Apr 21 21:48 mkinitrd -> /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh drwxr-xr-x 2 root root 4096 Apr 21 21:48 patches [root@ocfs2-node2 ~]# rpm -qf /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh comoonics-bootimage-1.4-20 There should be a symlink from /opt/atix/comoonics-bootimage/create-gfs-initrd-generic.sh => /opt/atix/comoonics-bootimage/mkinitrd > > script but it doesn't seem to be contained in any > of the rpms. > > Help appreciated! > Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Dan M. <dan...@or...> - 2009-04-22 00:20:33
|
Hi Marc -- cc'ing the list this time... I want to get EL5 OCFS2 shared-root working with the latest preview rpms. Here's what I downloaded from download.atix.de: * ocfs2-tools-debuginfo-1.4.1-1.el5 * ocfs2-tools-1.4.1-1.el5 * ocfs2-tools-devel-1.4.1-1.el5 * ocfs2console-1.4.1-1.el5 * comoonics-pythonosfix-py-0.1-2 * comoonics-bootimage-listfiles-1.3-8.el5 * SysVinit-comoonics-2.86-14.atix.1.i386 * comoonics-cluster-py-0.1-17 * comoonics-cdsl-py-0.2-11 * comoonics-bootimage-1.4-19 * comoonics-cs-py-0.1-56 * comoonics-bootimage-initscripts-1.4.9.rhel5 * comoonics-bootimage-extras-ocfs2-0.1-3 * comoonics-bootimage-extras-xen-0.1-5 I did not find any version of this that was mentioned in the how-to: * comoonics-release-0.1-1 and I couldn't find an el5 version of this, just rhel5: * comoonics-bootimage-initscripts-1.4.9.rhel5 and installing the rpms above, I got failed dependencies for: comoonics-bootimage-initscripts-rhel and comoonics-bootimage-initscripts-rhel5 and couldn't find these in download.atix.de Last, I was interested in looking at your /opt/atix/comoonics-bootimage/mkinitrd script but it doesn't seem to be contained in any of the rpms. Help appreciated! Dan > -----Original Message----- > From: Marc Grimme [mailto:gr...@at...] > Sent: Friday, April 17, 2009 8:45 AM > To: Dan Magenheimer > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > Hi Dan, > sorry for the late answer, but I was very busy with releasing > the next bunch > of osr rpms. > So I now think the most recent preview version should be the > best also for > you. > Be sure to install the comoonics-bootimage series 1.4. > > Let me know if you have problems. > > Regards Marc. > On Thursday 09 April 2009 20:38:57 Dan Magenheimer wrote: > > Hi Marc -- > > > > I finally have some time to try open-sharedroot again. > > My enviroment is OCFS2-on-RHEL5 (on Xen). Should I wait > > until this release is done? Will the versions of the > > RPMS (listed in the RHEL4 OCFS2 how-to) be all different? > > > > Thanks, > > Dan > > > > > -----Original Message----- > > > From: Marc Grimme [mailto:gr...@at...] > > > Sent: Friday, March 27, 2009 10:18 AM > > > To: ope...@li... > > > Cc: ope...@li... > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > Candidate of > > > comoonics-bootimage > > > > > > > > > Hello *, > > > I finished the last implementations and bugfixes of the > > > (nearly) final release > > > candidate of the new comoonics tools (Version 4.5 including > > > comoonics-bootimage). > > > Here are the last relevant changes: > > > > > > * Some bugfixes in hardware detection specially with udev > > > renaming nics and > > > therefore "deadlocking" udevd (timeout gets called) > > > * Bugfixes in powering up nics. They might be called multiple > > > times (i.e. when > > > using bridging and bonding and vlans at once ;-) ) > > > * Verified ocfs2 with RHEL5 and ocfs2 with SLES10 > > > * Updated the lite option in mkinitrd <gordans approch> to be > > > more general and > > > not necessarily need "lsmod". > > > * Implemented filters to globally remove files from initrd > > > (Gordan's idea) > > > * Implemented an update feature in the initrd (see > > > https://bugzilla.atix.de/show_bug.cgi?id=340). That's a cool > > > one. Updating an > > > initrd within seconds. > > > * Fixed bugs (338, 331, ..) > > > > > > Some thoughts to sum up: > > > 1. It is now possible to use the lite version of mkinitrd. It > > > should reduce > > > the size of the initrd by 50%. > > > Use -L if you want to include all loaded modules or if you > > > need more modules > > > use either -M <modulename> or specify > > > in /etc/comoonics/comoonics-bootimage.cfg > > > 2. The mkinitrd -U updates an existing initrd an therefore > > > also speeds up the > > > initrd building. With -A <kernelversion> and -D > > > <kernelversion> even kernels > > > can be exchanged in the same initrd. > > > > > > So that's it. > > > > > > Open issues before having a proper RC is: > > > Retest with SLES10(ocfs2/nfs) and test with FC10(nfs/gfs2). > > > > > > That's it, let me know what you think and have fun. > > > > > > NOTE: Those packages are NOT YET production ready. And they > > > might include > > > bugs as they did not undergo the Q&A. So use them at your > own risc. > > > > > > P.S. And may the root always be shared! ;-) > > > > > > On Tuesday 10 February 2009 20:53:18 Marc Grimme wrote: > > > > Hello *, > > > > finally we got the first stage of the new version of the > > > > > > osr cluster suite > > > > > > > in the preview channel. This will be the first release > > > > > > candidate for the > > > > > > > next productive channel. > > > > > > > > NOTE: Those packages are NOT YET production ready. And they > > > > > > might include > > > > > > > bugs as they did not undergo the Q&A. So use them at > your own risc. > > > > > > > > 2nd NOTE: Those packages were only tested with OSR<RHEL5> > > > > > > clusters based on > > > > > > > GFS or NFS. OSR<SLES10/ocfs2>,OSR<Fedora/gfs2|nfs> and > > > > > > OSR<RHEL5/ocfs2> > > > > > > > will be tested next. This means you shouldn't use those > > > > > > packages with ocfs2 > > > > > > > as rootfilesystem. > > > > > > > > As this is only a preview update I will sum up the most > > > > > > important changes > > > > > > > not automated: > > > > > > > > - Rewritten hardwaredetection: > > > > > > https://bugzilla.atix.de/show_bug.cgi?id=325 > > > > > > > - Usability review: https://bugzilla.atix.de/show_bug.cgi?id=323 > > > > - Support for breakpoints in bootprocess > > > > - Break shell will always continue the bootprocess when exited > > > > - Bootparameters may be overwritten in Breakshell > > > > - Rootfs Check with bootparameter rootfsck (use with care > > > > > > and up to now > > > > > > > only with GFS) > > > > - NFS OSR will not start scsi detection > > > > - Preview support for OSR on fedora (NFS) > > > > - Preview support for OSR<NFS4> without authentication > > > > - Preview support for OSR<RHEL5 with localfs> > > > > - Many small bugfixes > > > > Patches from Gordan Bobic (all preview status): > > > > - glusterfs support > > > > - Software RAID with md support > > > > - diet patch (new option to mkinitrd <-l> to make > initrd smaller) > > > > > > > > Please let me know what you think > > > > > > > > Thanks > > > > Marc > > > > > > > > P.S. May the root always be shared! ;-) > > > > > > > > -- > > > > Gruss / Regards, > > > > > > > > Marc Grimme > > > > http://www.atix.de/ > http://www.open-sharedroot.org/ > > > > > > -------------------------------------------------------------- > > > ------------- > > > > > > >--- Create and Deploy Rich Internet Apps outside the browser with > > > > Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers > > > > > > can use existing > > > > > > > skills and code to build responsive, highly engaging > > > > > > applications that > > > > > > > combine the power of local resources and data with the > > > > > > reach of the web. > > > > > > > Download the Adobe AIR SDK and Ajax docs to start building > > > > > > applications > > > > > > > today-http://p.sf.net/sfu/adobe-com > > > > _______________________________________________ > > > > Open-sharedroot-users mailing list > > > > Ope...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > -- > > > Gruss / Regards, > > > > > > Marc Grimme > > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > > > > > > -------------------------------------------------------------- > > > ---------------- > > > _______________________________________________ > > > Open-sharedroot-users mailing list > > > Ope...@li... > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > -- > Gruss / Regards, > > Marc Grimme > Phone: +49-89 452 3538-14 > http://www.atix.de/ http://www.open-sharedroot.org/ > > ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > 168930, USt.-Id.: > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > Merz (Vors.) | > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > |
From: Marc G. <gr...@at...> - 2009-04-16 09:09:59
|
On Thursday 16 April 2009 10:44:25 Klaus Steinberger wrote: > Hello, > > we got a trouble with our OSR Cluster (SL 5.3), sometimes a node runs > unexpectedly out of memory and goes wild. > > For example I just started a mkinitrd and during this process the node > goes unresponsive. Looking at the console I saw many "Out of memory" > messages and killing some processes. > > Our current configuration: > > Three nodes running SL 5.3 > > Each node's memory is limited to 2 GByte, due to the fact that we run > XEN on top of it (we had bad experiences with ballooning dom0, so we > limited the memory) > > Normally the nodes need around 1 GByte: > > [root@aule ~]# cat /proc/meminfo > MemTotal: 2097152 kB > MemFree: 1197244 kB > Buffers: 7408 kB > Cached: 335876 kB > SwapCached: 0 kB > Active: 127860 kB > Inactive: 308336 kB > HighTotal: 0 kB > HighFree: 0 kB > LowTotal: 2097152 kB > LowFree: 1197244 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 152 kB > Writeback: 0 kB > AnonPages: 92852 kB > Mapped: 22608 kB > Slab: 93820 kB > PageTables: 5256 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > CommitLimit: 1048576 kB > Committed_AS: 1747212 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 4828 kB > VmallocChunk: 34359733487 kB > [root@aule ~]# > > > What would be the best way to add SWAP to a node? > Or any other idea? Or hust extende the dom0 Memory? I've seen this behaviour very rarely (only on clusters using xen and redhat cluster) but there seems to be some memory leak somewhere. Last time I've seen it (and I've only seen it about two times) I saw loads of cached objects in /proc/slabinfo. Currently I don't think it's related to using osr but to using the cluster or xen itself. Next time when you see something like this it would be great if you could provide a sysrq-m/t and the output of /proc/slabinfo. Just log in to the node having no memory via telnet <nodename> 12242 then type shell (you're now in a shell in the rescue chroot/fenceacksv). Next make type a cat /proc/slabinfo. Then exit from the shell and back in the fenceacksv type: memory (for sysrq-m) tasks (for sysrq-t) As prerequestit you should redirect the syslog to a central syslog server to get the memory dump and tasks. Also be aware that - if it takes very long for sysrq-t/m to continue - the other nodes might start fencing the one. But as you might nevertheless have to "restart" this node it doesn't hurt too much. Sorry but currently I don't see any other option as this happens very rarely and I could only once trace it down a little more. > > Sincerly, > Klaus Steinberger BTW: as you are using gfs/rgmanager and xen you should be aware of those redhat bugzillas: 485026, 490449, 487214, 468691 -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |
From: Klaus S. <kla...@Ph...> - 2009-04-16 08:44:38
|
Hello, we got a trouble with our OSR Cluster (SL 5.3), sometimes a node runs unexpectedly out of memory and goes wild. For example I just started a mkinitrd and during this process the node goes unresponsive. Looking at the console I saw many "Out of memory" messages and killing some processes. Our current configuration: Three nodes running SL 5.3 Each node's memory is limited to 2 GByte, due to the fact that we run XEN on top of it (we had bad experiences with ballooning dom0, so we limited the memory) Normally the nodes need around 1 GByte: [root@aule ~]# cat /proc/meminfo MemTotal: 2097152 kB MemFree: 1197244 kB Buffers: 7408 kB Cached: 335876 kB SwapCached: 0 kB Active: 127860 kB Inactive: 308336 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 2097152 kB LowFree: 1197244 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 152 kB Writeback: 0 kB AnonPages: 92852 kB Mapped: 22608 kB Slab: 93820 kB PageTables: 5256 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 1048576 kB Committed_AS: 1747212 kB VmallocTotal: 34359738367 kB VmallocUsed: 4828 kB VmallocChunk: 34359733487 kB [root@aule ~]# What would be the best way to add SWAP to a node? Or any other idea? Or hust extende the dom0 Memory? Sincerly, Klaus Steinberger |