From: Dan M. <dan...@or...> - 2009-04-28 15:30:04
|
Hi Marc -- As you probably know, Oracle's Enterprise Linux (EL) is a "clone" of Red Hat Enterprise Linux (RHEL) and is essentially identical except for bug fixes. I don't know if OSR needs to distinguish between EL and RHEL, but I'm sure you know if they do. I'm told that you can distinguish between EL5 and RHEL5 for RH/EL5ga, RH/EL5u1 and RH/EL5u2 because the file /etc/redhat-release has "Enterprise Linux Enterprise Linux release 5.X (codename)" in EL but has "Red Hat Enterprise Linux release 5.x (different_codename)" in RHEL. (Sorry, I don't know all the codenames.) Also, the file /etc/enterprise-release exists on EL but not on RHEL and has the same contents as /etc/redhat-release. HOWEVER, STARTING IN RH/EL5u3, this changes. The file /etc/redhat-release is the SAME for EL5u3 and RHEL5u3: "Red Hat Enterprise Linux release 5.3 (codename)" And for EL5u3, the file /etc/enterprise-release is different than /etc/redhat-release. For EL5u3, /etc/enterprise-release has "Enterprise Linux Enterprise Linux release 5.3 (codename)" but /etc/redhat-release has: "Red Hat Enterprise Linux release 5.3 (different_codename)" It looks to me like the OSR scripts already distinguish between EL5 and RHEL5, so there is probably a bug somewhere. Hope that helps! Thanks, Dan > -----Original Message----- > From: Marc Grimme [mailto:gr...@at...] > Sent: Tuesday, April 28, 2009 1:05 AM > To: Dan Magenheimer > Cc: ope...@li... > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > > Attached is the output from the messages command to > > the rescueshell for a freshly created initrd (with > > a smaller lib/modules). > > > > Some messages I see on the console which do not appear > > in the "messages" output: > > > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > > unknown_hardware_detect: command not found > That's the wrong distribution detection. > This function is called as ${distribution}_hardware_detect. > Which will fail in > your case. Send me a cat /etc/*-release and ls -1 > /etc/*-release and I'll > make a patch for it. > > > > Loading modules for all found network cardsFATAL: Module > xennet not found. > > > > error: "xen.independent_wallclock" is an unknown key > > > > > -----Original Message----- > > > From: Dan Magenheimer > > > Sent: Monday, April 27, 2009 6:14 PM > > > To: Marc Grimme; ope...@li... > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > Candidate of > > > comoonics-bootimage > > > > > > > > > Hi Marc -- > > > > > > Thanks for the reply. First, let me clarify that I am > > > trying two different approaches: > > > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > > a Xen-paravirtualized 2.6.29 kernel running as a > > > guest on Xen-3.4.0), or > > > (B) Build my own initrd and use your OSR scripts > > > only after my initrd finishes. > > > > > > I couldn't get (A) to work... the initrd built using > > > the howto was failing very early, so I decided to > > > try with (B). I hoped that (B) would be easier because > > > your code is very general to handle many different > > > kinds of systems, and mine could be much more specific. > > > > > > HOWEVER, I just discovered one problem with (A). > > > Your mkinitrd process builds a huge (200MB) initrd > > > and there appears to be a BUG IN XEN that fails > > > to load large initrds (larger than about 100MB)! > > > > > > Your initrd is so large because my lib/modules/2.6.29 > > > is very large. If I delete that from initrd built > > > using the howto, the initrd.gz is only about 24MB > > > and I am able to boot and see the ATIX logo and > > > it drops into the rescue shell (because I haven't > > > specified the MAC-Addresses I think). However other > > > boot errors complain about modules that are missing. > > > I will try to build a kernel with fewer modules and > > > see how that goes. > > > > > > Other feedback: > > > > > > I wonder if your script detecting "distribution" > > > and "shortdistribution" are correct for all versions > > > of RHEL and Oracle Enterprise Linux? I am getting > > > "unknown" for both (from listparameters in the rescue > > > shell), though I am booting Oracle Enterprise Linux 5 > > > update 2. > > > > > > I also see that your test for detecting xen in xen-lib.sh > > > is not very good. You may want to test for /proc/xen > > > instead of (or in addition to) /etc/xen. > > > > > > And why when "Loading modules for all found network cards" > > > do I get "FATAL: Module xennet not found"? I have xen > > > networking compiled into my kernel so there is no module > > > for it. Should this be fatal? > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > identity in the /etc/cluster/cluster.conf. > > > > > > Is that really necessary? Ocfs2 only requires the > > > node name, not the mac address. But I guess I can > > > configure the mac address in the xen guest config file > > > so I can live with this. > > > > > > Thanks, > > > Dan > > > > > > > -----Original Message----- > > > > From: Marc Grimme [mailto:gr...@at...] > > > > Sent: Monday, April 27, 2009 12:52 AM > > > > To: ope...@li... > > > > Cc: Dan Magenheimer > > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > > Candidate of > > > > comoonics-bootimage > > > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > > Hi Marc -- > > > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > > now much further along but have hit another roadblock. > > > > > > > > > > First, FYI, I am using a different approach then the > > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > > way that I want to use the shared root. Specifically, > > > > > I am first building and booting a root ocfs2 filesystem, > > > > > using my own kernel (2.6.29) and my own initrd.img. With > > > > > this (and before I install any OSR stuff), I am able > > > > > to boot it as a Xen paravirtualized guest, using > > > > > the Xen kernel= and ramdisk= config options; the root > > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > > > No problem with this my osr-ocfs2 cluster is running exactly > > > > the same. No LVM, > > > > direct boot via kernel and initrd. That should not be a problem. > > > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > > IP address and hostname from the Xen config file. > > > > > This all seems to work fine (for a single node). > > > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > > identity in > > > > the /etc/cluster/cluster.conf. > > > > Also you might try to set the onboot flag in the cluster.conf > > > > at the nic > > > > config to "no": > > > > <com_info> > > > > .. > > > > <eth name="eth0" mac="..." onboot="no"/> > > > > .. > > > > </com_info> > > > > I didn't test it yet (it not YET in my testcases) but I'm > > > > pretty confident it > > > > should work. > > > > > > > > > Next, I install the OSR rpms directly in the running > > > > > ocfs2-root guest, then shut it down. > > > > > > > > Ok. so far so good. > > > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > > (that also has the OSR rpms installed) and follow > > > > > the howto steps to create the cdsl infrastructure > > > > > and links. Then I shut down the other guest. > > > > > > > > Could you recall the exact steps and outputs? > > > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > > but it has problems. It appears that /var doesn't > > > > > exist as I get many messages such as: > > > > > > > > The not mouting of /var is very strange. It should put you in > > > > a rescue shell. > > > > Then type messages and send me the output. > > > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or > directory > > > > > > > > That's just before it trys to boot. That's somehow to > far advanced. > > > > > > > > > and then the boot process seems to hang trying to start the > > > > > System Logger. No, it just takes a very long time and > > > > > eventually I get to a login prompt. (Or I can boot > > > > > single-user mode and get the same error messages, but > > > > > get to a bash prompt.) > > > > > > > > > > With a "ls -l /var", I see: > > > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > > > That's perfectly ok. > > > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > > > That's strange. > > > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > > (with subdirectories 0 ... 7 and default) and each has > > > > > an etc and a var subdirectory. /cluster/shared has > > > > > a var subdirectory and a var/lib subdirectory. > > > > > > > > That's again perfectly ok.- > > > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > > linked to /cluster but isn't. True? > > > > > > > > Right but it is not liked but bind mounted. That means: > > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > > but that's done in the initrd automatically so you should not > > > > have to bother > > > > about that. > > > > > > > > > One other thing I should mention... since my cluster.conf > > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > > command during the cdsl setup steps, I used cluster/cdsl/0 > > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > > > How did you "use" that. That should be done automatically > > > > shouldn't it? > > > > > > > > > Any ideas? Maybe your initrd creates some necessary links > > > > > and mine does not? (I tried booting with your initrd, > > > > > but my ocfs2-root failed to mount giving a kernel panic... > > > > > have you tested with linux-2.6.29? The error message > > > > > "Heartbeat has to be started to mount a read-write > > > > > clustered device" looks like it comes from a somewhat > > > > > recent ocfs2 kernel patch I found here: > > > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > > > and I worked around it by mounting with -o heartbeat=local) > > > > > > > > Sorry this is so long! > > > > > > How does your /etc/cluster/cluster.conf look like? > > > > > > -- > > > Gruss / Regards, > > > > > > Marc Grimme > > > http://www.atix.de/ http://www.open-sharedroot.org/ > > > > > -------------------------------------------------------------- > ------------- > >--- Register Now & Save for Velocity, the Web Performance & > Operations > > Conference from O'Reilly Media. Velocity features a full day of > > expert-led, hands-on workshops and two days of sessions > from industry > > leaders in dedicated Performance & Operations tracks. Use > code vel09scf > > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > > _______________________________________________ > > Open-sharedroot-users mailing list > > Ope...@li... > > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users > > > > -- > Gruss / Regards, > > Marc Grimme > Phone: +49-89 452 3538-14 > http://www.atix.de/ http://www.open-sharedroot.org/ > > ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | > 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org > > Registergericht: Amtsgericht Muenchen, Registernummer: HRB > 168930, USt.-Id.: > DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas > Merz (Vors.) | > Vorsitzender des Aufsichtsrats: Dr. Martin Buss > > > |