Menu

#1176 Problem: MDB stops reacting

v3.0 Rabbit Hole
closed-fixed
JBossMQ (152)
5
2003-09-26
2003-03-11
No

I would like to report a potential bug.
I've made a post in Jboss forums and there was some
responses from other people supposedly experiencing
simular problems.

Below, I repeat the problem description and our current
state of researching the roots of the problem:
=====
Setup:
Our system is centered around Jboss 3.0.4+Jetty
running on AS400. In short, it handles HTTP requests :)
Current (test environment) average load on the system is
somewhere around 30 requests per minute, peak values
so far were around 200. Production systems certainly as
usual has to handle more.

The basic workflow/design of the system can be
described as follows:
- requests are picked up by the servlets
- each type of request is handled by separate servlet
(actually the same class, but different config parameters
through web.xml)
- servlet parses the request, composes an object
message and submits it to the core via request JMS
queue with one of the properties set to specific constant
reflecting the request type
- servlet goes to timed-synchronous waiting cycle on
response queue, if no response arrives from core in 10
sec, a specific (timeout) response is sent to the client
- core "interface" is a set of MDB that call the rest
(Session and Entity beans) or respond back directly
through response queue. Instances of each different
MDB class handle request messages only of a certain
type specified via message selector in descriptor.

Problem:
After a certain time (~couple of days) one (or several, no
system visible) MDBs stop responding to messages. In
the logs it is visible that message was submitted to the
queue, after that nothing happens, clients eventually
receive timeout response. If we restart Jboss or
stop/destroy/create/start the jar with MDB through JMX
console everything starts working again.
===

I've noticed that in our case problems seem to start after
the following message appears in the logs: "WARNING:
NACK issued. The message consumer was not waiting
for a message."

Also we've made some more detailed research of logs
and Jboss source. After we enable tracing here is what
we've got:

2003-03-06 16:34:21,068 TRACE
[1046964853752:TracingInterceptor] CALLED : receive
2003-03-06 16:34:21,082 TRACE
[1046964853752:TracingInterceptor] ARG : -2147473121
2003-03-06 16:34:21,090 TRACE
[1046964853752:TracingInterceptor] ARG : -1
2003-03-06 16:34:21,090 TRACE
[1046964853752:ServerSecurityInterceptor] Checking
receive authorize on
SpyDistributedConnection:ID:78/ca35b8c9ad1c166f2156
0d812148ea9e subId=-2147473121
2003-03-06 16:34:21,092 TRACE
[1046964853752:TracingInterceptor] EXCEPTION :
receive:
javax.jms.JMSException: The provided subscription
does not exist
java/lang/Throwable.<init>(Ljava/lang/String;)V+4
(Throwable.java:90)
javax/jms/JMSException.<init>
(Ljava/lang/String;Ljava/lang/String;)V+0
(JMSException.java:0)
javax/jms/JMSException.<init>(Ljava/lang/String;)V+0
(JMSException.java:0)
org/jboss/mq/server/ClientConsumer.getSubscription(I)
Lorg/jboss/mq/Subscription;+0 (ClientConsumer.java:0)
org/jboss/mq/server/JMSDestinationManager.getSubscri
ption(Lorg/jboss/mq/ConnectionToken;I)
Lorg/jboss/mq/Subscription;+0
(JMSDestinationManager.java:0)
org/jboss/mq/server/JMSServerInterceptorSupport.getSub
scription(Lorg/jboss/mq/ConnectionToken;I)
Lorg/jboss/mq/Subscription;+0
(JMSServerInterceptorSupport.java:0)
org/jboss/mq/security/ServerSecurityInterceptor.receive
(Lorg/jboss/mq/ConnectionToken;IJ)
Lorg/jboss/mq/SpyMessage;+0
(ServerSecurityInterceptor.java:0)
org/jboss/mq/server/TracingInterceptor.receive
(Lorg/jboss/mq/ConnectionToken;IJ)
Lorg/jboss/mq/SpyMessage;+0
(TracingInterceptor.java:0)
org/jboss/mq/server/JMSServerInvoker.receive
(Lorg/jboss/mq/ConnectionToken;IJ)
Lorg/jboss/mq/SpyMessage;+0
(JMSServerInvoker.java:0)
org/jboss/mq/il/jvm/JVMServerIL.receive
(Lorg/jboss/mq/ConnectionToken;IJ)
Lorg/jboss/mq/SpyMessage;+0 (JVMServerIL.java:0)
org/jboss/mq/Connection.receive
(Lorg/jboss/mq/Subscription;J)
Lorg/jboss/mq/SpyMessage;+0 (Connection.java:0)
org/jboss/mq/SpyMessageConsumer.receiveNoWait()
Ljavax/jms/Message;+0 (SpyMessageConsumer.java:0)
com/vocognition/talkface/jms/JMSModel.checkNoWaitFo
rReplyTo
(Lcom/vocognition/talkface/talkman/TalkmanMessage;)
Lcom/vocognition/talkface/talkman/ModelMessage;+0
(JMSModel.java:73)
com/vocognition/talkface/talkman/LUTHandler.service
(Lcom/vocognition/talkface/talkman/TalkmanRequest;Lco
m/vocognition/talkface/talkman/TalkmanResponse;)V+0
(LUTHandler.java:27)
com/vocognition/talkface/protocol/tfshttp/JMSServlet.doG
et
(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http
/HttpServletResponse;)V+0 (JMSServlet.java:112)
javax/servlet/http/HttpServlet.service
(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http
/HttpServletResponse;)V+0 (HttpServlet.java:0)
javax/servlet/http/HttpServlet.service
(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResp
onse;)V+0 (HttpServlet.java:0)
org/mortbay/jetty/servlet/ServletHolder.handle
(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResp
onse;)V+0 (ServletHolder.java:334)
org/mortbay/jetty/servlet/WebApplicationHandler.dispatc
h
(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;
Ljavax/servlet/http/HttpServletResponse;Lorg/mortbay/jett
y/servlet/ServletHolder;)V+0
(WebApplicationHandler.java:201)
org/mortbay/jetty/servlet/ServletHandler.handle
(Ljava/lang/String;Ljava/lang/String;Lorg/mortbay/http/Htt
pRequest;Lorg/mortbay/http/HttpResponse;)V+0
(ServletHandler.java:518)
org/mortbay/http/HttpContext.handle
(Ljava/lang/String;Ljava/lang/String;Lorg/mortbay/http/Htt
pRequest;Lorg/mortbay/http/HttpResponse;)Z+0
(HttpContext.java:1665)
org/mortbay/jetty/servlet/WebApplicationContext.handle
(Ljava/lang/String;Ljava/lang/String;Lorg/mortbay/http/Htt
pRequest;Lorg/mortbay/http/HttpResponse;)Z+0
(WebApplicationContext.java:540)
org/mortbay/http/HttpContext.handle
(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR
esponse;)Z+0 (HttpContext.java:1576)
org/mortbay/http/HttpServer.service
(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR
esponse;)Lorg/mortbay/http/HttpContext;+0
(HttpServer.java:840)
org/jboss/jetty/Jetty.service
(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR
esponse;)Lorg/mortbay/http/HttpContext;+0
(Jetty.java:543)
org/mortbay/http/HttpConnection.service
(Lorg/mortbay/http/HttpRequest;Lorg/mortbay/http/HttpR
esponse;)Lorg/mortbay/http/HttpContext;+0
(HttpConnection.java:804)
org/mortbay/http/HttpConnection.handleNext()Z+0
(HttpConnection.java:871)
org/mortbay/http/HttpConnection.handle()V+0
(HttpConnection.java:822)
org/mortbay/http/SocketListener.handleConnection
(Ljava/net/Socket;)V+0 (SocketListener.java:176)
org/mortbay/util/ThreadedServer.handle
(Ljava/lang/Object;)V+0 (ThreadedServer.java:287)
org/mortbay/util/ThreadPool$JobRunner.run()V+0
(ThreadPool.java:658)
java/lang/Thread.run()V+11 (Thread.java:484)
2003-03-06 16:34:21,097 TRACE
[1046964853752:TracingInterceptor] RETURN : receive

We've added some more trace statements. The
subscription was definitely added, but there is no
evidence that it is removed. So it looks like it magically
dissappears :(

I am worried about one thing - in JbossMQ
ClientConsumer source all operations on subsriptions
HashMap are done in blocks synchronized on
subscriptions object itself by making clones, operating
on them and then setting subscriptions reference to the
new object (as in the following code snippets). I am not
sure why is it done like this, and I suspect that it
actually breaks the idea of synchronization because as
it looks right after the reference is changed another
thread may acquire a lock on a new object, while the
first thread is still holding a lock on the old one. Please
take a look:
===
public void addSubscription(Subscription req) throws
JMSException
{
if( log.isTraceEnabled() )
log.trace("Adding subscription for: " + req);
req.connectionToken = connectionToken;
req.clientConsumer = this;

JMSDestination jmsdest = (JMSDestination)
server.getJMSDestination(req.destination);
if (jmsdest == null)
{
throw new JMSException("The destination " +
req.destination + " does not exist !");
}

jmsdest.addSubscriber(req);

synchronized (subscriptions)
{
HashMap subscriptionsClone = (HashMap)
subscriptions.clone();
subscriptionsClone.put(new Integer(req.subscriptionId),
req);
subscriptions = subscriptionsClone;
}
}
===

public void close()
{
boolean trace = log.isTraceEnabled();
if( trace )
log.trace("" + this +"->close()");

synchronized (messages)
{
closed = true;
if (enqueued)
{
if( trace )
log.trace("" + this +"->close(): Cancelling work in
progress.");
threadPool.cancelWork(this);
enqueued = false;
}
}

synchronized (subscriptions)
{
Iterator i = subscriptions.keySet().iterator();
while (i.hasNext())
{
Integer subscriptionId = (Integer) i.next();
try
{
removeSubscription(subscriptionId.intValue());
}
catch(JMSException ignore)
{
}
}
}

// Nack the removed subscriptions, the connection is
gone
HashMap removedSubsClone = (HashMap) ((HashMap)
removedSubscriptions).clone();
Iterator i = removedSubsClone.values().iterator();
while (i.hasNext())
{
Subscription removed = (Subscription) i.next();
JMSDestination queue = (JMSDestination)
server.getJMSDestination(removed.destination);
if (queue == null)
log.warn("The subscription was registered with a
destination that does not exist: " + removed);
try
{
queue.nackMessages(removed);
}
catch (JMSException e)
{
log.warn("Unable to nack removed subscription: " +
removed, e);
}
}
}

Discussion

1 2 > >> (Page 1 of 2)
  • Eduard Letifov

    Eduard Letifov - 2003-03-11
    • milestone: 228434 --> v3.0 Rabbit Hole
     
  • Adrian Brock

    Adrian Brock - 2003-03-14
    • assigned_to: nobody --> ejort
     
  • Adrian Brock

    Adrian Brock - 2003-03-14

    Logged In: YES
    user_id=9459

    I've changed the ClientConsumer to not clone
    the subscriptions, except when iterating at close to avoid
    concurrent modification exceptions.
    The semantics of this synchronization are wrong.

    There are two other locations in the code doing a
    similar thing. The changes for these are still outstanding
    so I haven't closed this bug report yet.

    SpySession and consumers
    JMSDestinationManager and destinations

    Regards,
    Adrian

     
  • Eduard Letifov

    Eduard Letifov - 2003-04-16

    Logged In: YES
    user_id=438410

    Adrian,

    I have applied the patch, and deployed it as the customer's
    but unfortunately it haven't solved the main problem. MDB
    goes "offline" almoust everyday, and sometimes right from
    the start of the server.

    Please, can you at least direct me where to look next?

    BTW. I do have the trace logs, should I attach them here?

     
  • Adrian Brock

    Adrian Brock - 2003-04-16

    Logged In: YES
    user_id=9459

    Yes, attach the trace logs. Preferably the relevent part
    where it goes offline and a bit before.

    Is it possible for you test this with jboss3.0.7?
    There have been a number of fixes since 3.0.4
    that could be relevent.
    One was a fix where the subscription ids were
    not being generated uniquely.
    Another was a fix to the message consumer where
    it could stall if it nacked a message that had passed
    its time to live.
    Another was to update to the latest concurrent.jar

    Regards,
    Adrian

     
  • Eduard Letifov

    Eduard Letifov - 2003-04-17

    Logged In: YES
    user_id=438410

    Thank you, Adrian. I was thinking about this as well after I've
    seen your answer to some other bug. I will try it today.

    Attached are the server and trace logs. Jboss was just
    started. It has began answering requests with several MDBs
    (you can search for thread signatures 1050489832192,
    1050489845715, 1050489880169, 1050489913857,
    1050489922977, 1050489948331, 1050489976720,
    1050490009021 to see it).

    Then a request came for GetAssignmentPuts (see
    1050490010994, 1050490048469, 1050490082657,
    1050480116032) that should have invoked
    GetAssignmentPutsMDB but never did.

     
  • Eduard Letifov

    Eduard Letifov - 2003-04-17

    Logged In: YES
    user_id=438410

    Ok, upload failed. I am putting the logs and deployment
    descriptors on our ftp server.

    ftp://ftp.vocognition.com/share/jboss/20030416.zip

     
  • Eduard Letifov

    Eduard Letifov - 2003-05-09

    Logged In: YES
    user_id=438410

    Adrian,

    We experience the same problem with Jboss 3.0.7 and
    Jboss 3.2.0.
    Please tell me what can I do to help you solve it?

     
  • Eduard Letifov

    Eduard Letifov - 2003-05-09

    Logged In: YES
    user_id=438410

    Adrian,

    We experience the same problem with Jboss 3.0.7 and
    Jboss 3.2.0.
    Please tell me what can I do to help you solve it?

     
  • Eduard Letifov

    Eduard Letifov - 2003-05-09

    Logged In: YES
    user_id=438410

    Adrian,

    We just tried it with 3.2.1 - same problem

     
  • Eduard Letifov

    Eduard Letifov - 2003-05-14

    Logged In: YES
    user_id=438410

    Adrian,

    we have the same problem with Jboss 3.2.1 on Windows,
    looks like a thread deadlock or something. Silly question, but
    would thread dumps, trace logs help?

     
  • Adrian Brock

    Adrian Brock - 2003-05-14

    Logged In: YES
    user_id=9459

    Hi Eduard,

    I had a quick look at your logs over the weekend but
    I didn't come to any conclusion.

    I have a suspicion that are there are two problems
    that maybe related?

    1) The Nack when not waiting for a message in the
    temporary reply queue.
    2) The MDB not receiving any messages.

    If it is a deadlock you should be able to get a
    thread dump on windows using a ctrl-break.
    If you are using java 1.3 start jboss with -Xdebug
    so you can see what is locked.

    The logs you included did not include the full trace.
    information. Do you have the trace nterceptor
    configured in jbossmq-service.xml?
    Can you run it with it with TRACE enabled for
    org.jboss.mq
    but use INFO for the very noisy
    org.jboss.mq.server.MessageCache
    org.jboss.mq.server.MessageReference

    For the mdb you will also want to include
    org.jboss.jms
    org.jboss.ejb.plugins.jms

    Also the class org.jboss.mq.MessageConsumer
    which is where the NACK occurs doesn't have much
    TRACE logging.
    You might want to add some, in particular the
    changes for field waitingForMessage

    Regards,
    Adrian

     
  • Adrian Brock

    Adrian Brock - 2003-05-23

    Logged In: YES
    user_id=9459

    Hi can you try this again with latest CVS either 3.0 or 3.2
    I just fixed a problem in the ServerSession pool that
    was causing a similar problem with MQSeries
    as the jms provider.

    Regards,
    Adrian

     
  • Adrian Brock

    Adrian Brock - 2003-06-25

    Logged In: YES
    user_id=9459

    ping

     
  • Eduard Letifov

    Eduard Letifov - 2003-06-25

    Logged In: YES
    user_id=438410

    Sorry, Adrian.

    Yes, we took 3.0.8 and it seems to be working, at least here
    in the office we were not able to break it. We have delivered
    the version with 3.0.8 to the customer and now we just have
    to wait when their IT department installs it.

    Personally, I think you can close this bug.

    Ed.

     
  • Adrian Brock

    Adrian Brock - 2003-06-25
    • status: open --> closed
     
  • Adrian Brock

    Adrian Brock - 2003-06-25

    Logged In: YES
    user_id=9459

    Ok, reopen if you still have problems.

    I do have one outstanding fix to backport from 3.2 to 3.0,
    I'll try to find time to do in the next couple of days.

    Regards,
    Adrian

     
  • Eduard Letifov

    Eduard Letifov - 2003-09-01

    Logged In: YES
    user_id=438410

    Adrian,

    We have just experienced the same problem at the
    customer's with 3.0.8. Right after the Jboss is started the
    number of listeners on the queue is less by one that it should
    be. If I redeploy the ear it goes back again. Still after some
    activity it drops again.

    Looks exactly like it used to be... Sorry. Is there anything we
    can do to identify what the problem is?

    Ed

     
  • Eduard Letifov

    Eduard Letifov - 2003-09-01

    Logged In: YES
    user_id=438410

    Adrian,

    We have just experienced the same problem at the
    customer's with 3.0.8. Right after the Jboss is started the
    number of listeners on the queue is less by one that it should
    be. If I redeploy the ear it goes back again. Still after some
    activity it drops again.

    Looks exactly like it used to be... Sorry. Is there anything we
    can do to identify what the problem is?

    Ed

     
  • Adrian Brock

    Adrian Brock - 2003-09-01
    • status: closed --> open
     
  • Adrian Brock

    Adrian Brock - 2003-09-01

    Logged In: YES
    user_id=9459

    Did you go through the logging changes I mentioned in my
    previous post?

    Regards,
    Adrian

     
  • Eduard Letifov

    Eduard Letifov - 2003-09-04

    Logged In: YES
    user_id=438410

    No, we haven't yet, cause we managed to convince ourselves
    that it has been solved with 3.0.8

    Now, to define it more clearly today I witness the following:
    There is queue. It has 28 MDB's listening with selectors.
    If I start Jboss with my ear already being in the deploy dir the
    queue reports (in JMX view) 27 receivers. Then I "touch" my
    ear, it gets redeployed and receivers are back to 28. I cannot
    confirm that the problem exists in exactly the same form as it
    used to be (MDB going offline after some time) but at least it
    is very related one. Should I open a different bug & enable
    logging in something different?

     
  • Eduard Letifov

    Eduard Letifov - 2003-09-04

    Logged In: YES
    user_id=438410

    No, we haven't yet, cause we managed to convince ourselves
    that it has been solved with 3.0.8

    Now, to define it more clearly today I witness the following:
    There is queue. It has 28 MDB's listening with selectors.
    If I start Jboss with my ear already being in the deploy dir the
    queue reports (in JMX view) 27 receivers. Then I "touch" my
    ear, it gets redeployed and receivers are back to 28. I cannot
    confirm that the problem exists in exactly the same form as it
    used to be (MDB going offline after some time) but at least it
    is very related one. Should I open a different bug & enable
    logging in something different?

     
  • Eduard Letifov

    Eduard Letifov - 2003-09-26

    Logged In: YES
    user_id=438410

    Adrian.

    We still encounter the problem every time Jboss is restarted.
    The logs of such restart, with all the traces you've mentioned
    before enabled are here:

    ftp://ftp.vocognition.com/share/jboss/20030926.zip

    Ed.

     
  • Adrian Brock

    Adrian Brock - 2003-09-26

    Logged In: YES
    user_id=9459

    I'm not sure what I'm looking at:

    Your logs show everything stops at 9pm (was that a deadlock,
    did you take a threaddump?)

    You reboot at 1am, there are only three messages in the system,
    which are persistent messages recovered in to-talkman during
    recovery.

    Everything else is listening on empty destinations.

    Regards,
    Adrian

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB