From: Dan M. <dan...@or...> - 2009-05-04 23:30:16
|
Hi Marc -- I've worked through the OSR setup process again in my environment and am documenting it. For the most part, it is working, but intermittently. Booting some nodes works fine once and then fails the next time with no changes. One common failure appears to be due to a failure in "Detecting nodeid & nodename..." But a problem I've seen with this new cluster: When I have a problem (such as the above "Detecting..."), the boot process on this cluster no longer falls into a rescue shell but instead into a bash shell. So I can't look at the repository. One difference I've used with this cluster is that I did the com.../mkinitrd with your new "-l" option. I wonder if maybe this option is failing to copy a shared library or something else the rescueshell needs so the rescushell fails to work? Thanks, Dan > -----Original Message----- > From: Dan Magenheimer > Sent: Thursday, April 30, 2009 10:44 AM > To: Marc Grimme; ope...@li... > Subject: RE: [OSR-users] New Preview RPMs for next Release > Candidate of > comoonics-bootimage > > > > > I think you are probably working with a xen-linux guest with > > > kernel version 2.6.18? I think the patch that causes > > > this problem came after linux-2.6.19 in this Linux commit: > > > > > > http://www.kernel.org/hg/index.cgi/linux-2.6/rev/710f6c6bd06c > > I read from this patch that heartbeat=local is needed if you > > want to use ocfs2 > > as local filesystem. That's the reason why heartbeat is set > > to be local. But > > that implies not to being able to share this filesystem with > > more nodes. > > > > But when you want to use it concurrently you must not use > > heartbeat=local. > > > > Or did I read something wrong? > > I agree that "heartbeat=local" is confusing. > > The ocfs2 team tells me that "heartbeat=local" is used > for a cluster where the heartbeat is on the same disk > as the filesystem. For non-clustered mounts, > "heartbeat=none" is used. There was going to be > a "heartbeat=global" to designate one volume as the > heartbeat device for all volumes, but that patch > was shelved. > > Unrelated, I often randomly get the following message > when trying to boot an OSR el5-ocfs2 image. When I > reboot (without changing anything at all) it boots fine. > All of my config files look fine (and they work fine > on reboot). Maybe there is a race somewhere? > > Thanks, > Dan > > The nodeid for this node could not be detected. This usually > means the MAC-Addresses > specified in the cluster configuration could not be matched > to any MAC-Adresses of this node. > > You can either fix this and build a new initrd or hardset > the nodeid via "setparameter nodeid <number>" > |