From: Dan M. <dan...@or...> - 2009-04-28 22:22:04
|
FYI, the FATAL messages appear to be harmless and I have now successfully booted to a login prompt, then booted a second and third VM using the OSR also to login prompts! Thanks very much for your help! I *think* an ocfs2 kernel patch is required, or possibly a "-o heartbeat=local" option might need to be specified. Let me reproduce everything and check on that. If you have any changed/newer rpm's I should use, please send them or send download URLs. Otherwise, I will use this list and hack in any changes needed. Thanks, Dan comoonics-bootimage-1.4-19.noarch.rpm comoonics-bootimage-extras-ocfs2-0.1-3.noarch.rpm comoonics-bootimage-extras-xen-0.1-5.noarch.rpm comoonics-bootimage-initscripts-1.4-9.rhel5.noarch.rpm comoonics-bootimage-listfiles-1.3-8.el5.noarch.rpm comoonics-bootimage-listfiles-all-0.1-5.noarch.rpm comoonics-bootimage-listfiles-rhel-0.1-3.noarch.rpm comoonics-bootimage-listfiles-rhel5-0.1-3.noarch.rpm comoonics-cdsl-py-0.2-12.noarch.rpm comoonics-cluster-py-0.1-17.noarch.rpm comoonics-cs-py-0.1-56.noarch.rpm comoonics-pythonosfix-py-0.1-2.noarch.rpm SysVinit-comoonics-2.86-14.atix.1.i386.rpm > -----Original Message----- > From: Dan Magenheimer > Sent: Tuesday, April 28, 2009 12:01 PM > To: Marc Grimme; ope...@li... > Subject: RE: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > > you might try the patch or file itself attached. > > Yes, this patch seems to get the distribution set properly. > clutype still gets set to gfs... is that OK even if > I'm using ocfs2? > > > /etc/xen should only be tested for Dom0 "NOT" DomU. > > I think this problem goes away with the correct distro > setting. > > > > And why when "Loading modules for all found network cards" > > > do I get "FATAL: Module xennet not found"? I have xen > > > networking compiled into my kernel so there is no module > > > for it. Should this be fatal? > > Hm. Up to now it seems to be ;-) . Nowerdays I only saw > > kernels which are > > modularized. So this is a usecase where the errordetection > detects a > > nonexistant error. I'll have to think about it. > > Perhaps also check lib/modules/build/.config to see if the > config option is set to "=y"? > > I am now also seeing FATAL error reports when trying to > load the ocfs2 modules, scsi, dm, and others. I have all > of these compiled into my kernel. > > > -----Original Message----- > > From: Marc Grimme [mailto:gr...@at...] > > Sent: Tuesday, April 28, 2009 9:58 AM > > To: ope...@li... > > Cc: Dan Magenheimer > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > Hi Dan, > > you might try the patch or file itself attached. > > > > Do a > > --------------------------------X8---------------------------- > > --------- > > source /opt/atix/comoonics/bootimage/boot-scripts/etc/std-lib.sh > > sourceLibs /opt/atix/comoonics/bootimage/boot-script > > sourceRootfsLibs /opt/atix/comoonics/bootimage/boot-script > > getDistributionList > > --------------------------------X8---------------------------- > > --------- > > > > and let me know what the output is. > > > > Regards Marc. > > > > On Tuesday 28 April 2009 17:29:16 Dan Magenheimer wrote: > > > Hi Marc -- > > > > > > As you probably know, Oracle's Enterprise Linux (EL) > > > is a "clone" of Red Hat Enterprise Linux (RHEL) and is > > > essentially identical except for bug fixes. I don't > > > know if OSR needs to distinguish between EL and RHEL, > > > but I'm sure you know if they do. > > > > > > I'm told that you can distinguish between EL5 and RHEL5 > > > for RH/EL5ga, RH/EL5u1 and RH/EL5u2 because the file > > > /etc/redhat-release has > > > > > > "Enterprise Linux Enterprise Linux release 5.X (codename)" > > > > > > in EL but has > > > > > > "Red Hat Enterprise Linux release 5.x (different_codename)" > > > > > > in RHEL. > > > > > > (Sorry, I don't know all the codenames.) > > > > > > Also, the file /etc/enterprise-release exists on EL but > > > not on RHEL and has the same contents as /etc/redhat-release. > > > > > > HOWEVER, STARTING IN RH/EL5u3, this changes. The file > > > /etc/redhat-release is the SAME for EL5u3 and RHEL5u3: > > > > > > "Red Hat Enterprise Linux release 5.3 (codename)" > > > > > > And for EL5u3, the file /etc/enterprise-release is different > > > than /etc/redhat-release. For EL5u3, /etc/enterprise-release has > > > > > > "Enterprise Linux Enterprise Linux release 5.3 (codename)" > > > > > > but /etc/redhat-release has: > > > > > > "Red Hat Enterprise Linux release 5.3 (different_codename)" > > > > > > It looks to me like the OSR scripts already distinguish > > > between EL5 and RHEL5, so there is probably a bug somewhere. > > > > > > Hope that helps! > > > > > > Thanks, > > > Dan > > > > > > > -----Original Message----- > > > > From: Marc Grimme [mailto:gr...@at...] > > > > Sent: Tuesday, April 28, 2009 1:05 AM > > > > To: Dan Magenheimer > > > > Cc: ope...@li... > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > Candidate of > > > > comoonics-bootimage > > > > > > > > On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > > > > > Attached is the output from the messages command to > > > > > the rescueshell for a freshly created initrd (with > > > > > a smaller lib/modules). > > > > > > > > > > Some messages I see on the console which do not appear > > > > > in the "messages" output: > > > > > > > > > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > > > > > unknown_hardware_detect: command not found > > > > > > > > That's the wrong distribution detection. > > > > This function is called as ${distribution}_hardware_detect. > > > > Which will fail in > > > > your case. Send me a cat /etc/*-release and ls -1 > > > > /etc/*-release and I'll > > > > make a patch for it. > > > > > > > > > Loading modules for all found network cardsFATAL: Module > > > > > > > > xennet not found. > > > > > > > > > error: "xen.independent_wallclock" is an unknown key > > > > > > > > > > > -----Original Message----- > > > > > > From: Dan Magenheimer > > > > > > Sent: Monday, April 27, 2009 6:14 PM > > > > > > To: Marc Grimme; ope...@li... > > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > > Candidate of > > > > > > comoonics-bootimage > > > > > > > > > > > > > > > > > > Hi Marc -- > > > > > > > > > > > > Thanks for the reply. First, let me clarify that I am > > > > > > trying two different approaches: > > > > > > > > > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > > > > > a Xen-paravirtualized 2.6.29 kernel running as a > > > > > > guest on Xen-3.4.0), or > > > > > > (B) Build my own initrd and use your OSR scripts > > > > > > only after my initrd finishes. > > > > > > > > > > > > I couldn't get (A) to work... the initrd built using > > > > > > the howto was failing very early, so I decided to > > > > > > try with (B). I hoped that (B) would be easier because > > > > > > your code is very general to handle many different > > > > > > kinds of systems, and mine could be much more specific. > > > > > > > > > > > > HOWEVER, I just discovered one problem with (A). > > > > > > Your mkinitrd process builds a huge (200MB) initrd > > > > > > and there appears to be a BUG IN XEN that fails > > > > > > to load large initrds (larger than about 100MB)! > > > > > > > > > > > > Your initrd is so large because my lib/modules/2.6.29 > > > > > > is very large. If I delete that from initrd built > > > > > > using the howto, the initrd.gz is only about 24MB > > > > > > and I am able to boot and see the ATIX logo and > > > > > > it drops into the rescue shell (because I haven't > > > > > > specified the MAC-Addresses I think). However other > > > > > > boot errors complain about modules that are missing. > > > > > > I will try to build a kernel with fewer modules and > > > > > > see how that goes. > > > > > > > > > > > > Other feedback: > > > > > > > > > > > > I wonder if your script detecting "distribution" > > > > > > and "shortdistribution" are correct for all versions > > > > > > of RHEL and Oracle Enterprise Linux? I am getting > > > > > > "unknown" for both (from listparameters in the rescue > > > > > > shell), though I am booting Oracle Enterprise Linux 5 > > > > > > update 2. > > > > > > > > > > > > I also see that your test for detecting xen in xen-lib.sh > > > > > > is not very good. You may want to test for /proc/xen > > > > > > instead of (or in addition to) /etc/xen. > > > > > > > > > > > > And why when "Loading modules for all found network cards" > > > > > > do I get "FATAL: Module xennet not found"? I have xen > > > > > > networking compiled into my kernel so there is no module > > > > > > for it. Should this be fatal? > > > > > > > > > > > > > Ok. But still we need the mac adress for > detecting the nodes > > > > > > > identity in the /etc/cluster/cluster.conf. > > > > > > > > > > > > Is that really necessary? Ocfs2 only requires the > > > > > > node name, not the mac address. But I guess I can > > > > > > configure the mac address in the xen guest config file > > > > > > so I can live with this. > > > > > > > > > > > > Thanks, > > > > > > Dan > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Marc Grimme [mailto:gr...@at...] > > > > > > > Sent: Monday, April 27, 2009 12:52 AM > > > > > > > To: ope...@li... > > > > > > > Cc: Dan Magenheimer > > > > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > > > > Candidate of > > > > > > > comoonics-bootimage > > > > > > > > > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > > > > > Hi Marc -- > > > > > > > > > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > > > > > now much further along but have hit another roadblock. > > > > > > > > > > > > > > > > First, FYI, I am using a different approach then the > > > > > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > > > > > way that I want to use the shared root. Specifically, > > > > > > > > I am first building and booting a root ocfs2 filesystem, > > > > > > > > using my own kernel (2.6.29) and my own > initrd.img. With > > > > > > > > this (and before I install any OSR stuff), I am able > > > > > > > > to boot it as a Xen paravirtualized guest, using > > > > > > > > the Xen kernel= and ramdisk= config options; the root > > > > > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > > > > > > > > > No problem with this my osr-ocfs2 cluster is > running exactly > > > > > > > the same. No LVM, > > > > > > > direct boot via kernel and initrd. That should not > > be a problem. > > > > > > > > > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > > > > > IP address and hostname from the Xen config file. > > > > > > > > This all seems to work fine (for a single node). > > > > > > > > > > > > > > Ok. But still we need the mac adress for > detecting the nodes > > > > > > > identity in > > > > > > > the /etc/cluster/cluster.conf. > > > > > > > Also you might try to set the onboot flag in the > > cluster.conf > > > > > > > at the nic > > > > > > > config to "no": > > > > > > > <com_info> > > > > > > > .. > > > > > > > <eth name="eth0" mac="..." onboot="no"/> > > > > > > > .. > > > > > > > </com_info> > > > > > > > I didn't test it yet (it not YET in my testcases) but I'm > > > > > > > pretty confident it > > > > > > > should work. > > > > > > > > > > > > > > > Next, I install the OSR rpms directly in the running > > > > > > > > ocfs2-root guest, then shut it down. > > > > > > > > > > > > > > Ok. so far so good. > > > > > > > > > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > > > > > (that also has the OSR rpms installed) and follow > > > > > > > > the howto steps to create the cdsl infrastructure > > > > > > > > and links. Then I shut down the other guest. > > > > > > > > > > > > > > Could you recall the exact steps and outputs? > > > > > > > > > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > > > > > but it has problems. It appears that /var doesn't > > > > > > > > exist as I get many messages such as: > > > > > > > > > > > > > > The not mouting of /var is very strange. It should > > put you in > > > > > > > a rescue shell. > > > > > > > Then type messages and send me the output. > > > > > > > > > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or > > > > > > > > directory > > > > > > > > > > > That's just before it trys to boot. That's somehow to > > > > > > > > far advanced. > > > > > > > > > > > > and then the boot process seems to hang trying to > > start the > > > > > > > > System Logger. No, it just takes a very long time and > > > > > > > > eventually I get to a login prompt. (Or I can boot > > > > > > > > single-user mode and get the same error messages, but > > > > > > > > get to a bash prompt.) > > > > > > > > > > > > > > > > With a "ls -l /var", I see: > > > > > > > > > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > > > > > > > > > That's perfectly ok. > > > > > > > > > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > > > > > > > > > That's strange. > > > > > > > > > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > > > > > (with subdirectories 0 ... 7 and default) and each has > > > > > > > > an etc and a var subdirectory. /cluster/shared has > > > > > > > > a var subdirectory and a var/lib subdirectory. > > > > > > > > > > > > > > That's again perfectly ok.- > > > > > > > > > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > > > > > linked to /cluster but isn't. True? > > > > > > > > > > > > > > Right but it is not liked but bind mounted. That means: > > > > > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > > > > > but that's done in the initrd automatically so you > > should not > > > > > > > have to bother > > > > > > > about that. > > > > > > > > > > > > > > > One other thing I should mention... since my > cluster.conf > > > > > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > > > > > command during the cdsl setup steps, I used > cluster/cdsl/0 > > > > > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > > > > > > > > > How did you "use" that. That should be done automatically > > > > > > > shouldn't it? > > > > > > > > > > > > > > > Any ideas? Maybe your initrd creates some > necessary links > > > > > > > > and mine does not? (I tried booting with your initrd, > > > > > > > > but my ocfs2-root failed to mount giving a > kernel panic... > > > > > > > > have you tested with linux-2.6.29? The error message > > > > > > > > "Heartbeat has to be started to mount a read-write > > > > > > > > clustered device" looks like it comes from a somewhat > > > > > > > > recent ocfs2 kernel patch I found here: > > > > > > > > > > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > > > > > > > > > and I worked around it by mounting with -o > heartbeat=local) > > > > > > > > > > > > > > Sorry this is so long! > > > > > > > > > > > > How does your /etc/cluster/cluster.conf look like? > > > > > > > > > > > > -- > > > > > > Gruss / Regards, > > > > > > > > > > > > Marc Grimme > > > > > > http://www.atix.de/ > > http://www.open-sharedroot.org/ > > > > > > > > -------------------------------------------------------------- > > > > ------------- > > > > > > > > >--- Register Now & Save for Velocity, the Web Performance & > > > > > > > > Operations > > > > > > > > > Conference from O'Reilly Media. Velocity features a > full day of > > > > > expert-led, hands-on workshops and two days of sessions > > > > > > > > from industry > > > > > > > > > leaders in dedicated Performance & Operations tracks. Use > > > > > > > > code vel09scf > > > > > > > > > and Save an extra 15% before 5/3. > > http://p.sf.net/sfu/velocityconf > > > > > _______________________________________________ > > > > > Open-sharedroot-users mailing list > > > > > Ope...@li... > > > > > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > > > -- > > > > Gruss / Regards, > > > > > > > > Marc Grimme > > > > Phone: +49-89 452 3538-14 > > > > http://www.atix.de/ > http://www.open-sharedroot.org/ > > > > > > > > ATIX Informationstechnologie und Consulting AG | > > Einsteinstrasse 10 | > > > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > > > 168930, USt.-Id.: > > > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > > > Merz (Vors.) | > > > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > > > > > > > -------------------------------------------------------------- > > ------------- > > >--- Register Now & Save for Velocity, the Web Performance & > > Operations > > > Conference from O'Reilly Media. Velocity features a full day of > > > expert-led, hands-on workshops and two days of sessions > > from industry > > > leaders in dedicated Performance & Operations tracks. Use > > code vel09scf > > > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > > _______________________________________________ > > > Open-sharedroot-users mailing list > > > Ope...@li... > > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > Phone: +49-89 452 3538-14 > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > ATIX Informationstechnologie und Consulting AG | > Einsteinstrasse 10 | > > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > > 168930, USt.-Id.: > > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > > Merz (Vors.) | > > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > |