From: Marc G. <gr...@at...> - 2009-04-28 07:05:28
|
On Tuesday 28 April 2009 02:51:49 Dan Magenheimer wrote: > Attached is the output from the messages command to > the rescueshell for a freshly created initrd (with > a smaller lib/modules). > > Some messages I see on the console which do not appear > in the "messages" output: > > Detecting Hardware ./etc/hardware-lib.sh: line 458: > unknown_hardware_detect: command not found That's the wrong distribution detection. This function is called as ${distribution}_hardware_detect. Which will fail in your case. Send me a cat /etc/*-release and ls -1 /etc/*-release and I'll make a patch for it. > > Loading modules for all found network cardsFATAL: Module xennet not found. > > error: "xen.independent_wallclock" is an unknown key > > > -----Original Message----- > > From: Dan Magenheimer > > Sent: Monday, April 27, 2009 6:14 PM > > To: Marc Grimme; ope...@li... > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > Candidate of > > comoonics-bootimage > > > > > > Hi Marc -- > > > > Thanks for the reply. First, let me clarify that I am > > trying two different approaches: > > > > (A) Use your entire OSR RHEL5+OCFS2 howto (with > > a Xen-paravirtualized 2.6.29 kernel running as a > > guest on Xen-3.4.0), or > > (B) Build my own initrd and use your OSR scripts > > only after my initrd finishes. > > > > I couldn't get (A) to work... the initrd built using > > the howto was failing very early, so I decided to > > try with (B). I hoped that (B) would be easier because > > your code is very general to handle many different > > kinds of systems, and mine could be much more specific. > > > > HOWEVER, I just discovered one problem with (A). > > Your mkinitrd process builds a huge (200MB) initrd > > and there appears to be a BUG IN XEN that fails > > to load large initrds (larger than about 100MB)! > > > > Your initrd is so large because my lib/modules/2.6.29 > > is very large. If I delete that from initrd built > > using the howto, the initrd.gz is only about 24MB > > and I am able to boot and see the ATIX logo and > > it drops into the rescue shell (because I haven't > > specified the MAC-Addresses I think). However other > > boot errors complain about modules that are missing. > > I will try to build a kernel with fewer modules and > > see how that goes. > > > > Other feedback: > > > > I wonder if your script detecting "distribution" > > and "shortdistribution" are correct for all versions > > of RHEL and Oracle Enterprise Linux? I am getting > > "unknown" for both (from listparameters in the rescue > > shell), though I am booting Oracle Enterprise Linux 5 > > update 2. > > > > I also see that your test for detecting xen in xen-lib.sh > > is not very good. You may want to test for /proc/xen > > instead of (or in addition to) /etc/xen. > > > > And why when "Loading modules for all found network cards" > > do I get "FATAL: Module xennet not found"? I have xen > > networking compiled into my kernel so there is no module > > for it. Should this be fatal? > > > > > Ok. But still we need the mac adress for detecting the nodes > > > identity in the /etc/cluster/cluster.conf. > > > > Is that really necessary? Ocfs2 only requires the > > node name, not the mac address. But I guess I can > > configure the mac address in the xen guest config file > > so I can live with this. > > > > Thanks, > > Dan > > > > > -----Original Message----- > > > From: Marc Grimme [mailto:gr...@at...] > > > Sent: Monday, April 27, 2009 12:52 AM > > > To: ope...@li... > > > Cc: Dan Magenheimer > > > Subject: Re: [OSR-users] New Preview RPMs for next Release > > > Candidate of > > > comoonics-bootimage > > > > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > > > Hi Marc -- > > > > > > > > Thanks for the help. I got past my rpm problems and am > > > > now much further along but have hit another roadblock. > > > > > > > > First, FYI, I am using a different approach then the > > > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > > > way that I want to use the shared root. Specifically, > > > > I am first building and booting a root ocfs2 filesystem, > > > > using my own kernel (2.6.29) and my own initrd.img. With > > > > this (and before I install any OSR stuff), I am able > > > > to boot it as a Xen paravirtualized guest, using > > > > the Xen kernel= and ramdisk= config options; the root > > > > disk is NOT an LVM because I don't need a /boot. The > > > > > > No problem with this my osr-ocfs2 cluster is running exactly > > > the same. No LVM, > > > direct boot via kernel and initrd. That should not be a problem. > > > > > > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > > > IP address and hostname from the Xen config file. > > > > This all seems to work fine (for a single node). > > > > > > Ok. But still we need the mac adress for detecting the nodes > > > identity in > > > the /etc/cluster/cluster.conf. > > > Also you might try to set the onboot flag in the cluster.conf > > > at the nic > > > config to "no": > > > <com_info> > > > .. > > > <eth name="eth0" mac="..." onboot="no"/> > > > .. > > > </com_info> > > > I didn't test it yet (it not YET in my testcases) but I'm > > > pretty confident it > > > should work. > > > > > > > Next, I install the OSR rpms directly in the running > > > > ocfs2-root guest, then shut it down. > > > > > > Ok. so far so good. > > > > > > > Next, I mount the ocfs2-root-disk from another guest > > > > (that also has the OSR rpms installed) and follow > > > > the howto steps to create the cdsl infrastructure > > > > and links. Then I shut down the other guest. > > > > > > Could you recall the exact steps and outputs? > > > > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > > > but it has problems. It appears that /var doesn't > > > > exist as I get many messages such as: > > > > > > The not mouting of /var is very strange. It should put you in > > > a rescue shell. > > > Then type messages and send me the output. > > > > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory > > > > > > That's just before it trys to boot. That's somehow to far advanced. > > > > > > > and then the boot process seems to hang trying to start the > > > > System Logger. No, it just takes a very long time and > > > > eventually I get to a login prompt. (Or I can boot > > > > single-user mode and get the same error messages, but > > > > get to a bash prompt.) > > > > > > > > With a "ls -l /var", I see: > > > > > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > > > > > > That's perfectly ok. > > > > > > > (Note no leading / before cdsl.local) > > > > > > > > but "ls -ld /cdsl.local" shows it is empty. > > > > > > That's strange. > > > > > > > Browsing around, I see that /cluster/cdsl is populated > > > > (with subdirectories 0 ... 7 and default) and each has > > > > an etc and a var subdirectory. /cluster/shared has > > > > a var subdirectory and a var/lib subdirectory. > > > > > > That's again perfectly ok.- > > > > > > > So I'm guessing that cdsl.local should somehow be > > > > linked to /cluster but isn't. True? > > > > > > Right but it is not liked but bind mounted. That means: > > > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > > > but that's done in the initrd automatically so you should not > > > have to bother > > > about that. > > > > > > > One other thing I should mention... since my cluster.conf > > > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > > > command during the cdsl setup steps, I used cluster/cdsl/0 > > > > instead of cluster/cdsl/1 to bind to cdsl.local. > > > > > > How did you "use" that. That should be done automatically > > > shouldn't it? > > > > > > > Any ideas? Maybe your initrd creates some necessary links > > > > and mine does not? (I tried booting with your initrd, > > > > but my ocfs2-root failed to mount giving a kernel panic... > > > > have you tested with linux-2.6.29? The error message > > > > "Heartbeat has to be started to mount a read-write > > > > clustered device" looks like it comes from a somewhat > > > > recent ocfs2 kernel patch I found here: > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > > > and I worked around it by mounting with -o heartbeat=local) > > > > > > Sorry this is so long! > > > > How does your /etc/cluster/cluster.conf look like? > > > > -- > > Gruss / Regards, > > > > Marc Grimme > > http://www.atix.de/ http://www.open-sharedroot.org/ > > --------------------------------------------------------------------------- >--- Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > _______________________________________________ > Open-sharedroot-users mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) | Vorsitzender des Aufsichtsrats: Dr. Martin Buss |