One suggestion is that the system idling could be due to a deadlock on a
sleep lock. A simple deadlock would happen when process A holds lock M
and wants lock N, while process B holds lock N and wants lock M. This is
avoided by always getting locks in the same order, and always releasing
them in the opposite order.
I'm not sure how to start with 1 processor only, other than rebuilding
the kernel to be uniprocessor, but there might be a kernel argument to
do this. Anybody else know?
I think that if this is due to a sleep lock deadlock, then running with
1 processor would make the bug appear to go away.
BTW, I can be slow to respond sometimes, so it's best to also copy the
ssic-linux-devel mailing list, in case somebody else can help.
Rudolf Gabler wrote:
> Hi Brian,
> My system is a blade-system with 2 processeor each node.
> I started to debug in the following environment:
> The first node is running. When I enter kdb for the second node (it han=
> the initproc call) I used a
> and got the information that the system is idling.(No wonder I think I =
> an idle processor 1 of 4). At the next moment the system tells me that =
> is going down (and the other node also).
> This is my first kernel debugging session.
> Questions: Howto start with 1 processor only?
> -----Urspr=FCngliche Nachricht-----
> Von: Brian J. Watson [mailto:Brian.J.Watson@...]
> Gesendet: Freitag, 17. M=E4rz 2006 02:49
> An: Rudolf Gabler
> Cc: ssic-linux-devel@...
> Betreff: Re: [SSI-devel] 1.9.x and gfs
> Hi Rudi,
> It's been quite some time since GFS was tested with OpenSSI. What you'r=
> attempting is definitely a developer-type activity. If you're not afrai=
> to use kdb and dig into the OpenSSI and GFS kernel source, then you=20
> could earn the honors of making it work again. ;)
> The place to start would probably be the initproc_postroot_init()=20
> routine in cluster/ssi/vproc/nsc_initproc.c of the OpenSSI kernel=20
> source. I think this is what `check_config --initproc' calls in the ker=
> Rudolf Gabler wrote:
>> Hi all,
>> I'm new to this group and setup a 1.9.1 2-node openssi Fedora cluster =
>> i386. Chard-mount is not working for me but the cluster is well runnin=
>> fedora 3.
>> To get it HA I tried to setup a gfs-root fetched the cluster-1.0 sourc=
>> redhat, made the modules changed the initrd to start the ccsd, cman an=
>> fencing before mounting gfs and get a running 1-node cluster with this
>> When I try to boot the second node into the cluster it hangs (at least=
>> days before I stopped it) directly when it tries to exec check_config
>> --initproc (i.e. to fork init).
>> I dunno what could happen nor which debugging tool I should use to
>> investigate it.
>> Has anyone a hint for me?
>> Rudi Gabler
>> This SF.Net email is sponsored by xPML, a groundbreaking scripting
>> that extends applications into web and mobile media. Attend the live
>> and join the prime developer group breaking into this new coding
>> ssic-linux-devel mailing list