We are using HA-JDBC with JBoss in a two node PostgreSQL cluster, doing many
background db tasks out of thread pools. Things have generally been working
very well. However, sometimes those worker threads end up getting canceled for
one reason or another, and occasionally we hit a problem that ends up
confusing the JBoss JTA, and in our app we eventually run out of db
connections and crash (because of a db resource never being released). The
root of the problem is in the handleFailures call in HA-JDBC:
2012-08-14 12:23:43,047 ERROR (pool-118-thread-1) Error; - nested throwable:
(java.lang.IndexOutOfBoundsException: Index: 0, Size: 0)
2012-08-14 12:23:43,051 WARN (pool-118-thread-1) ARJUNA-16045 attempted
rollback of < formatId=131076, gtrid_length=29, bqual_length=28,
tx_uid=0:ffff7f000001:126a:502a7288:16fe, node_name=1,
branch_uid=0:ffff7f000001:126a:502a7288:1702, eis_name=unknown eis name >
(org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@5d3aa2b9) failed with
exception code -: java.lang.IndexOutOfBoundsException: Index:
0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at net.sf.hajdbc.sql.AbstractInvocationHandler.handleFailures(AbstractInvocati
onHandler.java:446)
at net.sf.hajdbc.sql.DatabaseWriteInvocationStrategy.invokeAll(DatabaseWriteIn
vocationStrategy.java:126)
at net.sf.hajdbc.sql.DatabaseWriteInvocationStrategy.invoke(DatabaseWriteInvoc
ationStrategy
We traced this down to a problem in AbstractDatabaseCluster.getAliveMap, where
the map doesn't get populated if a thread interrupt happens.
It's in the getAliveMap of the AbstractDatabaseCluster, in particular right
here:
// isAlive does not throw an exception
throw new IllegalStateException(e);
}
catch (InterruptedException e)
{
// This gets swallowed and the aliveMap ends up being empty
Thread.currentThread().interrupt();
}
}
We put some System.out.printlns in there and know that the problem only occurs
when that InterruptedException handler is fired.
Subsequently, in handleFailures, both the Database aliveList and the deadList
are empty, so the IndexOutOfBounds occurs when trying to build the
SQLException below, because the empty list is being dereferenced:
public void handleFailures(SortedMap<Database<D>, Exception> exceptionMap)
throws Exception
{
if (exceptionMap.size() == 1)
{
throw exceptionMap.get(exceptionMap.firstKey());
}
for (Database<D> database: list.subList(1, list.size()))
{
exception.setNextException(SQLExceptionFactory.createSQLException(exceptionMap
.get(database)));
}
throw exception;
}
It seems to me that the InterruptedException should not be swallowed in
getAliveMap, but be thrown out. Either that or the handleFailures should
handle the case where both lists are empty.
Any ideas on the best way to fix this?
Thanks,
Miles
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Techinically, the interrupt is not "swallowed" by getAliveMap(...) but rather
the interrupted status of the current thread is set. However, you're point
still remains. I think the best (i.e. most conservative) solution to this
issue is to presume a positive result if an InterruptedException is caught.
e.g.
Hi,
We are using HA-JDBC with JBoss in a two node PostgreSQL cluster, doing many
background db tasks out of thread pools. Things have generally been working
very well. However, sometimes those worker threads end up getting canceled for
one reason or another, and occasionally we hit a problem that ends up
confusing the JBoss JTA, and in our app we eventually run out of db
connections and crash (because of a db resource never being released). The
root of the problem is in the handleFailures call in HA-JDBC:
2012-08-14 12:23:43,047 ERROR (pool-118-thread-1) Error; - nested throwable:
(java.lang.IndexOutOfBoundsException: Index: 0, Size: 0)
2012-08-14 12:23:43,051 WARN (pool-118-thread-1) ARJUNA-16045 attempted
rollback of < formatId=131076, gtrid_length=29, bqual_length=28,
tx_uid=0:ffff7f000001:126a:502a7288:16fe, node_name=1,
branch_uid=0:ffff7f000001:126a:502a7288:1702, eis_name=unknown eis name >
(org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@5d3aa2b9) failed with
exception code -: java.lang.IndexOutOfBoundsException: Index:
0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at net.sf.hajdbc.sql.AbstractInvocationHandler.handleFailures(AbstractInvocati
onHandler.java:446)
at net.sf.hajdbc.sql.DatabaseWriteInvocationStrategy.invokeAll(DatabaseWriteIn
vocationStrategy.java:126)
at net.sf.hajdbc.sql.DatabaseWriteInvocationStrategy.invoke(DatabaseWriteInvoc
ationStrategy
We traced this down to a problem in AbstractDatabaseCluster.getAliveMap, where
the map doesn't get populated if a thread interrupt happens.
It's in the getAliveMap of the AbstractDatabaseCluster, in particular right
here:
for (Map.Entry<Database<D>, Future<Boolean>> futureMapEntry:
futureMap.entrySet())
{
try
{
map.get(futureMapEntry.getValue().get()).add(futureMapEntry.getKey());
}
catch (ExecutionException e)
{
// isAlive does not throw an exception
throw new IllegalStateException(e);
}
catch (InterruptedException e)
{
// This gets swallowed and the aliveMap ends up being empty
Thread.currentThread().interrupt();
}
}
We put some System.out.printlns in there and know that the problem only occurs
when that InterruptedException handler is fired.
Subsequently, in handleFailures, both the Database aliveList and the deadList
are empty, so the IndexOutOfBounds occurs when trying to build the
SQLException below, because the empty list is being dereferenced:
public void handleFailures(SortedMap<Database<D>, Exception> exceptionMap)
throws Exception
{
if (exceptionMap.size() == 1)
{
throw exceptionMap.get(exceptionMap.firstKey());
}
Map<Boolean, List<Database<D="">>> aliveMap =
this.cluster.getAliveMap(exceptionMap.keySet());
this.detectClusterPanic(aliveMap);
List<Database<D>> aliveList = aliveMap.get(true);
List<Database<D>> deadList = aliveMap.get(false);
if (!aliveList.isEmpty())
{
for (Database<D> database: deadList)
{
if (this.cluster.deactivate(database, this.cluster.getStateManager()))
{
this.logger.error(Messages.getMessage(Messages.DATABASE_DEACTIVATED, database,
this.cluster), exceptionMap.get(database));
}
}
}
List<Database<D>> list = aliveList.isEmpty() ? deadList : aliveList;
// IOOBE occurs here:
SQLException exception =
SQLExceptionFactory.createSQLException(exceptionMap.get(list.get(0)));
for (Database<D> database: list.subList(1, list.size()))
{
exception.setNextException(SQLExceptionFactory.createSQLException(exceptionMap
.get(database)));
}
throw exception;
}
It seems to me that the InterruptedException should not be swallowed in
getAliveMap, but be thrown out. Either that or the handleFailures should
handle the case where both lists are empty.
Any ideas on the best way to fix this?
Thanks,
Miles
Techinically, the interrupt is not "swallowed" by getAliveMap(...) but rather
the interrupted status of the current thread is set. However, you're point
still remains. I think the best (i.e. most conservative) solution to this
issue is to presume a positive result if an InterruptedException is caught.
e.g.
catch (InterruptedException e)
{
map.get(true).add(futureMapEntry.getKey());
Thread.currentThread().interrupt();
}
I'll commit this fix to both branches.
Actually this only affects the 2.0 branch. In master, failure detection is
done via SQLException introspection instead of a separate validation query.