I can not subscribe to developers' list so I post it here.


My test openssi cluster (1.0.0 rc1 RH9) used to crash under heavy or moderate load, when loadleveling is happening. I pinpointed the error to "is_loadlevelable" function in load_level.c. this line:

dentry = dget(PVP(p->p_vproc)->pvp_comm_de)


caused kernels oops because pvp_comm_de's reference count is 0. So I search the source tree for "pvp_comm_de" and found out the problem MIGHT be in cluster/ssi/vproc/dvp_vpops.c, where it called dput but did not set the pointer to NULL after.

After apply the patch below, it no longer crashs, I need someone to verify the logic of this patch and make sure that it won't have side effects.






--- cluster/ssi/vproc/dvp_vpops.c       2004-01-15 18:02:03.000000000 -0600

+++ /usr/src/redhat/BUILD/kernel-2.4.20/linux-2.4.20/cluster/ssi/vproc/dvp_vpops.c       2004-01-27 12:36:51.000000000 -0600

@@ -732,7 +732,9 @@

                        pv->pvp_pproc->exit_signal = -2;



+               pv->pvp_comm_de = NULL;


+               pv->pvp_comm_mnt = NULL;


        VPROC_UNLOCK_EXCL(v, "vpop_exit");