Menu

problem with getAliveMap if thread interrupts

Help
MliesK
2012-08-15
2012-09-28
  • MliesK

    MliesK - 2012-08-15

    Hi,

    We are using HA-JDBC with JBoss in a two node PostgreSQL cluster, doing many
    background db tasks out of thread pools. Things have generally been working
    very well. However, sometimes those worker threads end up getting canceled for
    one reason or another, and occasionally we hit a problem that ends up
    confusing the JBoss JTA, and in our app we eventually run out of db
    connections and crash (because of a db resource never being released). The
    root of the problem is in the handleFailures call in HA-JDBC:

    2012-08-14 12:23:43,047 ERROR (pool-118-thread-1) Error; - nested throwable:
    (java.lang.IndexOutOfBoundsException: Index: 0, Size: 0)
    2012-08-14 12:23:43,051 WARN (pool-118-thread-1) ARJUNA-16045 attempted
    rollback of < formatId=131076, gtrid_length=29, bqual_length=28,
    tx_uid=0:ffff7f000001:126a:502a7288:16fe, node_name=1,
    branch_uid=0:ffff7f000001:126a:502a7288:1702, eis_name=unknown eis name >
    (org.jboss.resource.adapter.jdbc.xa.XAManagedConnection@5d3aa2b9) failed with
    exception code -: java.lang.IndexOutOfBoundsException: Index:
    0, Size: 0
    at java.util.ArrayList.RangeCheck(ArrayList.java:547)
    at java.util.ArrayList.get(ArrayList.java:322)
    at net.sf.hajdbc.sql.AbstractInvocationHandler.handleFailures(AbstractInvocati
    onHandler.java:446)
    at net.sf.hajdbc.sql.DatabaseWriteInvocationStrategy.invokeAll(DatabaseWriteIn
    vocationStrategy.java:126)
    at net.sf.hajdbc.sql.DatabaseWriteInvocationStrategy.invoke(DatabaseWriteInvoc
    ationStrategy

    We traced this down to a problem in AbstractDatabaseCluster.getAliveMap, where
    the map doesn't get populated if a thread interrupt happens.
    It's in the getAliveMap of the AbstractDatabaseCluster, in particular right
    here:

    for (Map.Entry<Database<D>, Future<Boolean>> futureMapEntry:
    futureMap.entrySet())
    {
    try
    {
    map.get(futureMapEntry.getValue().get()).add(futureMapEntry.getKey());
    }
    catch (ExecutionException e)
    {

    // isAlive does not throw an exception
    throw new IllegalStateException(e);
    }
    catch (InterruptedException e)
    {
    // This gets swallowed and the aliveMap ends up being empty
    Thread.currentThread().interrupt();
    }
    }
    We put some System.out.printlns in there and know that the problem only occurs
    when that InterruptedException handler is fired.

    Subsequently, in handleFailures, both the Database aliveList and the deadList
    are empty, so the IndexOutOfBounds occurs when trying to build the
    SQLException below, because the empty list is being dereferenced:

    public void handleFailures(SortedMap<Database<D>, Exception> exceptionMap)
    throws Exception
    {
    if (exceptionMap.size() == 1)
    {
    throw exceptionMap.get(exceptionMap.firstKey());
    }

    Map<Boolean, List<Database<D="">>> aliveMap =
    this.cluster.getAliveMap(exceptionMap.keySet());

    this.detectClusterPanic(aliveMap);

    List<Database<D>> aliveList = aliveMap.get(true);
    List<Database<D>> deadList = aliveMap.get(false);

    if (!aliveList.isEmpty())
    {
    for (Database<D> database: deadList)
    {
    if (this.cluster.deactivate(database, this.cluster.getStateManager()))
    {
    this.logger.error(Messages.getMessage(Messages.DATABASE_DEACTIVATED, database,
    this.cluster), exceptionMap.get(database));
    }
    }
    }

    List<Database<D>> list = aliveList.isEmpty() ? deadList : aliveList;

    // IOOBE occurs here:
    SQLException exception =
    SQLExceptionFactory.createSQLException(exceptionMap.get(list.get(0)));

    for (Database<D> database: list.subList(1, list.size()))
    {
    exception.setNextException(SQLExceptionFactory.createSQLException(exceptionMap
    .get(database)));
    }

    throw exception;
    }

    It seems to me that the InterruptedException should not be swallowed in
    getAliveMap, but be thrown out. Either that or the handleFailures should
    handle the case where both lists are empty.

    Any ideas on the best way to fix this?

    Thanks,
    Miles

     
  • Paul Ferraro

    Paul Ferraro - 2012-08-16

    Techinically, the interrupt is not "swallowed" by getAliveMap(...) but rather
    the interrupted status of the current thread is set. However, you're point
    still remains. I think the best (i.e. most conservative) solution to this
    issue is to presume a positive result if an InterruptedException is caught.
    e.g.

    catch (InterruptedException e)
    {
    map.get(true).add(futureMapEntry.getKey());
    Thread.currentThread().interrupt();
    }

    I'll commit this fix to both branches.

     
  • Paul Ferraro

    Paul Ferraro - 2012-08-16

    Actually this only affects the 2.0 branch. In master, failure detection is
    done via SQLException introspection instead of a separate validation query.

     

Log in to post a comment.