this is what i did
two node ci-linux cluster.
node 1 and node 2 works smoothly by taking and
releasing locks.
bring node 2 down. Try to acquire the lock on node 1.
No response. Seems to be waiting for some information.
^C will end the application.
analysis.
seems to be a problem with state of node and lock
status. Attaching is the dmesg output when trying to
take the lock on node 1 afer node 2 has gone down.
Trying to acquire the lock with a new name works .
dmesg output during lock operation
Logged In: YES
user_id=230991
Another observation :
Bringing node 2 back makes the locking work again.
Logged In: YES
user_id=230991
I am taking a lock using .
status = dlmlock_sync( LKM_EXMODE, & lockst, LKM_VALBLK,
LOCK_NAME, strlen( LOCK_NAME ), NULL, NULL );
and unlocking it using .
status = dlmunlock_sync( lockst.lockid,
NULL,
LKM_FORCE|LKM_INVVALBLK);
With single node in /proc/haDLM/rldb i get the following entry
72719 | Openlocktest | VMS | 0005 | 3
with two nodes.
72719 | Openlocktest | VMS | 03E5 | 3
The flag value changed from 0005 03E5.
When the same test case is run on the new node added i get
72719 | Openlocktest | VMS | 0005 | 2
After this when the new node that was added goes down any
lock operation on the Openlocktest hangs
-aneesh