From: Dan M. <dan...@or...> - 2009-04-28 00:14:13
|
Hi Marc -- Thanks for the reply. First, let me clarify that I am trying two different approaches: (A) Use your entire OSR RHEL5+OCFS2 howto (with a Xen-paravirtualized 2.6.29 kernel running as a guest on Xen-3.4.0), or (B) Build my own initrd and use your OSR scripts only after my initrd finishes. I couldn't get (A) to work... the initrd built using the howto was failing very early, so I decided to try with (B). I hoped that (B) would be easier because your code is very general to handle many different kinds of systems, and mine could be much more specific. HOWEVER, I just discovered one problem with (A). Your mkinitrd process builds a huge (200MB) initrd and there appears to be a BUG IN XEN that fails to load large initrds (larger than about 100MB)! Your initrd is so large because my lib/modules/2.6.29 is very large. If I delete that from initrd built using the howto, the initrd.gz is only about 24MB and I am able to boot and see the ATIX logo and it drops into the rescue shell (because I haven't specified the MAC-Addresses I think). However other boot errors complain about modules that are missing. I will try to build a kernel with fewer modules and see how that goes. Other feedback: I wonder if your script detecting "distribution" and "shortdistribution" are correct for all versions of RHEL and Oracle Enterprise Linux? I am getting "unknown" for both (from listparameters in the rescue shell), though I am booting Oracle Enterprise Linux 5 update 2. I also see that your test for detecting xen in xen-lib.sh is not very good. You may want to test for /proc/xen instead of (or in addition to) /etc/xen. And why when "Loading modules for all found network cards" do I get "FATAL: Module xennet not found"? I have xen networking compiled into my kernel so there is no module for it. Should this be fatal? > Ok. But still we need the mac adress for detecting the nodes > identity in the /etc/cluster/cluster.conf. Is that really necessary? Ocfs2 only requires the node name, not the mac address. But I guess I can configure the mac address in the xen guest config file so I can live with this. Thanks, Dan > -----Original Message----- > From: Marc Grimme [mailto:gr...@at...] > Sent: Monday, April 27, 2009 12:52 AM > To: ope...@li... > Cc: Dan Magenheimer > Subject: Re: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > On Friday 24 April 2009 00:42:17 Dan Magenheimer wrote: > > Hi Marc -- > > > > Thanks for the help. I got past my rpm problems and am > > now much further along but have hit another roadblock. > > > > First, FYI, I am using a different approach then the > > RHEL5 OCFS2 Shared Root Mini Howto, because of the > > way that I want to use the shared root. Specifically, > > I am first building and booting a root ocfs2 filesystem, > > using my own kernel (2.6.29) and my own initrd.img. With > > this (and before I install any OSR stuff), I am able > > to boot it as a Xen paravirtualized guest, using > > the Xen kernel= and ramdisk= config options; the root > > disk is NOT an LVM because I don't need a /boot. The > No problem with this my osr-ocfs2 cluster is running exactly > the same. No LVM, > direct boot via kernel and initrd. That should not be a problem. > > 2.6.29 kernel has CONFIG_IP_PNP and I pass in the > > IP address and hostname from the Xen config file. > > This all seems to work fine (for a single node). > Ok. But still we need the mac adress for detecting the nodes > identity in > the /etc/cluster/cluster.conf. > Also you might try to set the onboot flag in the cluster.conf > at the nic > config to "no": > <com_info> > .. > <eth name="eth0" mac="..." onboot="no"/> > .. > </com_info> > I didn't test it yet (it not YET in my testcases) but I'm > pretty confident it > should work. > > > > Next, I install the OSR rpms directly in the running > > ocfs2-root guest, then shut it down. > Ok. so far so good. > > > > Next, I mount the ocfs2-root-disk from another guest > > (that also has the OSR rpms installed) and follow > > the howto steps to create the cdsl infrastructure > > and links. Then I shut down the other guest. > Could you recall the exact steps and outputs? > > > > Next, I try to boot the OSR-modified ocfs2-root guest, > > but it has problems. It appears that /var doesn't > > exist as I get many messages such as: > The not mouting of /var is very strange. It should put you in > a rescue shell. > Then type messages and send me the output. > > > > /etc/rc.d/rc.sysinit: /var/log/dmesg: No such file or directory > That's just before it trys to boot. That's somehow to far advanced. > > > > and then the boot process seems to hang trying to start the > > System Logger. No, it just takes a very long time and > > eventually I get to a login prompt. (Or I can boot > > single-user mode and get the same error messages, but > > get to a bash prompt.) > > > > With a "ls -l /var", I see: > > > > lrwxrwxrwx 1 root root 14 <date> /var -> cdsl.local/var > That's perfectly ok. > > > > (Note no leading / before cdsl.local) > > > > but "ls -ld /cdsl.local" shows it is empty. > That's strange. > > > > Browsing around, I see that /cluster/cdsl is populated > > (with subdirectories 0 ... 7 and default) and each has > > an etc and a var subdirectory. /cluster/shared has > > a var subdirectory and a var/lib subdirectory. > That's again perfectly ok.- > > > > So I'm guessing that cdsl.local should somehow be > > linked to /cluster but isn't. True? > Right but it is not liked but bind mounted. That means: > mount --bind /cluster/cdsl/<nodeid> /cdsl.local > but that's done in the initrd automatically so you should not > have to bother > about that. > > > > One other thing I should mention... since my cluster.conf > > has 8 nodes numbered 0 to 7, in the "mount --bind" > > command during the cdsl setup steps, I used cluster/cdsl/0 > > instead of cluster/cdsl/1 to bind to cdsl.local. > How did you "use" that. That should be done automatically > shouldn't it? > > > > Any ideas? Maybe your initrd creates some necessary links > > and mine does not? (I tried booting with your initrd, > > but my ocfs2-root failed to mount giving a kernel panic... > > have you tested with linux-2.6.29? The error message > > "Heartbeat has to be started to mount a read-write > > clustered device" looks like it comes from a somewhat > > recent ocfs2 kernel patch I found here: > > http://www.mail-archive.com/ocf...@os.../msg00293.html > > and I worked around it by mounting with -o heartbeat=local) > > > > Sorry this is so long! > How does your /etc/cluster/cluster.conf look like? > > -- > Gruss / Regards, > > Marc Grimme > http://www.atix.de/ http://www.open-sharedroot.org/ > > |