Hi, paul, I have a few questions with locks of HA-JDBC, can you help me?
In LockingInvocationStrategy.invoke method, you make a circle to lock all the locks in the locklist, I think it may cause a deadlock if anothor thread is executing this method in the same time, is there any protection to avoid this?
DistributedLock.lock is called in LockingInvocationStrategy.invoke method if HA-JDBC uses jgroups. And in DistributedLock.lock, if this node is not coordinator, it send CoordinatorAcquireLockCommand with timeout=Long.MAX_VALUE to coordinator, then the coordinator first tryLock itself with the timeout, but the jgroups has it own timeout configured by the user, it will return null which is converted to false by HA-JDBC after time out, but it doesn't mean coordinator truely failed, what will happen if coordinator success after jgroups time out?
And even we set timeout equals jgroups's timeout with this.lock.tryLock(time, unit) in DistributedLock.tryLock(long time, TimeUnit unit) method, we will still have same problems: lockMembers method needs uncertain times, it may make the total time exceed the timeout.
in DistributedLockManager.lockMembers, "results" may be null if jgroups throws exception.
in DistributedLockManager.lockMembers, "locked &= entry.getValue()", what will happen if entry.getValue() is null? Or it can't be null? I don't think it is a certain thing.
In MemberAcquireLockCommand.execute(), you lock first, then put the lock into lockMap, but what will happen if the lockMap is null? it will cause a NPE, and leave a lock that can't be unlocked. This situation may happen when some node cause a MemberAcquireLockCommand then die, and if we can't make certain that jgroups will call MemberAcquireLockCommand.execute() before call viewAccepted(). I think make sure lockMap is not null before lock is a better idea, WDYT?
Last edit: yylqinghao 2014-08-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You are right - if a given statement requires multiple lock acquisitions, the deadlocks are possible if the order isn't consistent. I think sorting the list of objects to lock per statement should fix that concern.
It can't be null. It will either have a value, or not be contained in the map of results.
Traditionally, using FLUSH in the stack would prevent any such race condition. I had removed it, but now that we're relying on JGroups state transfer, I need to make sure the default stack includes it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, paul, I have a few questions with locks of HA-JDBC, can you help me?
In LockingInvocationStrategy.invoke method, you make a circle to lock all the locks in the locklist, I think it may cause a deadlock if anothor thread is executing this method in the same time, is there any protection to avoid this?
DistributedLock.lock is called in LockingInvocationStrategy.invoke method if HA-JDBC uses jgroups. And in DistributedLock.lock, if this node is not coordinator, it send CoordinatorAcquireLockCommand with timeout=Long.MAX_VALUE to coordinator, then the coordinator first tryLock itself with the timeout, but the jgroups has it own timeout configured by the user, it will return null which is converted to false by HA-JDBC after time out, but it doesn't mean coordinator truely failed, what will happen if coordinator success after jgroups time out?
And even we set timeout equals jgroups's timeout with this.lock.tryLock(time, unit) in DistributedLock.tryLock(long time, TimeUnit unit) method, we will still have same problems: lockMembers method needs uncertain times, it may make the total time exceed the timeout.
in DistributedLockManager.lockMembers, "results" may be null if jgroups throws exception.
in DistributedLockManager.lockMembers, "locked &= entry.getValue()", what will happen if entry.getValue() is null? Or it can't be null? I don't think it is a certain thing.
In MemberAcquireLockCommand.execute(), you lock first, then put the lock into lockMap, but what will happen if the lockMap is null? it will cause a NPE, and leave a lock that can't be unlocked. This situation may happen when some node cause a MemberAcquireLockCommand then die, and if we can't make certain that jgroups will call MemberAcquireLockCommand.execute() before call viewAccepted(). I think make sure lockMap is not null before lock is a better idea, WDYT?
Last edit: yylqinghao 2014-08-18
Last edit: yylqinghao 2014-08-18
Regarding #1, I verified that this is already the case, i.e. locks are acquired in a consistent (alphabetical) order.
For my questions 4, you think it can't be null, but the code is:
for (Map.Entry<Address, Rsp<R="">> entry: responses.entrySet())
{
Rsp<R> response = entry.getValue();
result = response.wasReceived() ? response.getValue() : null;
results.put(new AddressMember(entry.getKey()), result);
}
If response.wasReceived() is false, then result will be null, and results map will have an null value record.
And in my opinion, more protection in code is better.
I've refactored the CommandDispatcher API to draw a better distinction between sender vs receiver exceptions.
https://github.com/ha-jdbc/ha-jdbc/commit/7178c9a99dae68e6872bd86a02bf0739cd0059e0