Re: [beep4j-user] Concurrency issues
Status: Alpha
Brought to you by:
rasss
From: Thomson, M. <Mar...@an...> - 2008-05-06 05:58:44
|
Hi Simon, Thanks for the prompt response. Unfortunately for me, it's not the info I was hoping for. The text you point out in Section 2.6.1 of RFC 3080 severely limits the usability of BEEP for asynchronous applications. Interesting to note that beep4j seems quite capable of sending replies out of order...something that I'm quite happy about because it makes my asynchronous hack much easier. (that is, please don't fix this ;) Anyhow, at least I now know why it is behaving as I am seeing. I'll look at exploring other avenues. I might come back in a little while with a big feature patch :) As a side note - forcing synchrony on your callers limits the flexibility offered by the interface. I see that as a bad idea. To give an example - tasks assigned to a thread pool might be used to generate requests and process responses. If the same thread from that pool generated a request to two channels (presumably in different tasks); only the second reply handler would be found by beep4j - the reply handler for the first would never see any response. Regarding the bad seqno problem, this is a pure threading issue. Generating a test case is non-trivial for any concurrent code. I can try to explain though, now that I've looked into it further. ResponseHandler.sendRPY is being invoked (near-)simultaneously on two different threads. Both threads reach nextFrame() at about the same time. -A-> } else if (senderWindow.remaining() >= MINIMUM_FRAME_SIZE) { LOG.debug("split frame at position " + senderWindow.remaining() + " (channel=" + channel + ")"); -B-> Frame[] split = frame.split(senderWindow.remaining()); frames.addFirst(split[1]); return split[0]; } else { Both frames are too big to fit, and they both pass the line marked with 'A' at about the same time. The first thread continues, the second stops somewhere between 'A' and 'B'. The first thread sends its message, getting the remainder of the window. The second thread then re-awakens and consequently gets 0 bytes. By this point the second thread has already decided to send a frame, to it continues even though the frame is empty. The weird ordering of subsequent frames occurs because frames.addFirst() is called for the remainder of the message from the first thread before it is called by the second thread, leaving the remainder of the message from the second thread at the head of the list. Looking at the call chain for sendRPY, it is not synchronized anywhere in the beep4j package, whereas all of the other calls are synchronized on the SessionImpl instance. Since the call path for sendRPY doesn't traverse SessionImpl, using that lock isn't an (easy) option, so it is probably best to ensure that DefaultChannelController is thread safe. I've attached a patch for DefaultChannelController that makes it thread safe, assuming that frames, window and senderWindow are the only mutable state in the class. The also includes a fix for sending the SEQ frame between the two parts of a split frame. I defer the sending of the SEQ if there isn't enough room to send another frame (senderWindow.remaining() < MINIMUM_FRAME_SIZE), assuming that the last frame was incomplete if the window has closed. Cheers, Martin > -----Original Message----- > From: Simon [mailto:co...@gm...] > Sent: Tuesday, 6 May 2008 6:33 AM > To: beep4j user list > Cc: Thomson, Martin > Subject: Re: [beep4j-user] Concurrency issues > > Hi > > First of all thanks for your interest in beep4j. My comments are > inlined below. However, first I have to tell you that I didn't have > much time to work on beep4j recently. So it is kind of unmaintained > (as you might have guessed from the activity statistics). However, > 0.1.1 seems to be successfully used by few people. Of course, it might > depend on how the application uses beep4j. Anyways, I'm more than > happy to incorporate patches and help you find your way around the > beep4j codebase. > > Cheers, > Simon > > On May 5, 2008, at 9:08 AM, Thomson, Martin wrote: > > > Hi, > > > > I'm seeing a few problems with beep4j under high loads in concurrent > > applications. > > > > > > ~ Out of order messages > > > > On inspection of the code in SessionImpl.java, it appears that out of > > order receipt of response messages causes a protocol exception (which > > drops the connection). The replyListeners collection doesn't > maintain > > the set of ReplyListener instances for each channel based on the > > message > > number - the collection is a list. When a response arrives it pulls > > the > > first item of the list and raises the exception if the number doesn't > > match. > > > > One of the main advantages that BEEP provides is the out of order > > receipt of responses. If this feature wasn't available, we might as > > well be using HTTP. > > BEEP does provide out of order receipt of messages on different > channels. However, replies on the same channel must be received in > order. See section 2.6 of RFC 3080. The only exception (which I find > somewhat strange) are frames of different ANSwers to the same MSG. > > > > > I have a small patch that changes SessionImpl.java from 0.1.1 that > > uses > > a Map rather than a List, i.e. > > Map<Integer, Map<Integer, ReplyListenerHolder>> replyListeners > > > > I can send out that patch; however, it appears that there are a large > > number of changes between 0.1.1 and what is on the SVN trunk. I am > > concerned that the changes don't fix this and could cause more > > problems. > > > > It's difficult to follow the thread of execution; but based on a > rough > > inspection, I see that the ReplyListener is now saved in > > FilterChainTargetHolder, which uses a ThreadLocal. This would > > present a > > problem if there were two requests sent from the same thread; the > > second > > (and any subsequent requests from that thread) would overwrite the > > ReplyListener from the first. > > > > The code in the trunk is really not tested that much. So it probably > still contains lots of bugs. The main feature is filtering on channels > (interceptors), well known from other specs (e.g. servlet filters). > Additionally, the whole internals of beep4j were rewritten to work on > frames instead of full assembled messages. This gives the receiver the > chance to operate on frames (if he desires so) and so for example to > terminate the session if the sender sends huges messages in an attempt > to cause an OutOfMemoryError in the receiver. > > The ThreadLocals must only be used in a synchronous call. i.e. when > you send a message, the ReplyListener is put into the ThreadLocal and > later picked up at the end of the filter chain. However, when the > control returns to your application code, it should be reset and there > should be no way you can override it, whether from the same thread or > another one. Of course, it depends also on the used filters (i.e. a > filter introducing asynchrony might cause troubles here if not > properly implemented). > > I believe that the trunk implementation should really be the > implementation to be pushed. > > > > > ~ Impact of congestion control for TCP mapping > > > > We are running some tests using a single BEEP channel to send a > > moderately high rate of messages between two applications. Both are > > using beep4j. Both do non-trivial processing on multiple threads. > We > > frequently get errors with seqno mismatches. > > > > The following (abridged) sequence is direct from a Wireshark packet > > capture: > > > > RPY 3 12 * 3784 312 > > <!-- first part of reply to 3-12 --> > > END > > RPY 3 9 * 4257 0 > > END > > SEQ 3 23232 4096 > > RPY 3 9 . 4257 473 > > <!-- reply to 3-9 --> > > END > > RPY 3 12 . 4096 161 > > <!-- second part of reply to 3-12 --> > > END > > RPY 3 6 . 5203 473 > > <!-- reply to 3-6 --> > > END > > > > As you can see - out of order message numbers on replies (I'm using > > the > > SessionImpl.java patch mentioned above). More alarming is the fact > > that > > beep4j sends the first part of RPY 3-9 before finishing RPY 3-12. > > What > > is causes the failure is the seqno mismatch it sees when the first > > (empty) part of RPY 3-9 is found. > > Frames on the same channel must be sent in order. Does this also > happen without your patch (which I think violates the BEEP > specification, see above)? > > > > > > > I think that the problem is that when there are two frames in the > > queue > > and the sender window (i.e. that advertised by the other end) is too > > small, TCPMapping splits the first frame (correctly), puts the > > remainder > > back on the queue (correctly), but then continues to try to process > > subsequent frames. It should not be sending anything more (except > > maybe > > the SEQ message) when it does not have any space left in the window. > > > > The following might fix that part: > > > > private Frame nextFrame() { > > + if (frames.isEmpty() || senderWindow.remaining() <= 0) { > > - if (frames.isEmpty()) { > > return null; > > > > But I'm not sure what the calling context expects from sendFrames. I > > expect that it should manage, but I'll need to test this further to > > find > > out. > > > > The whole frame splitting code in nextFrame() should not shuffle > frames. If a frame is split, the first part is sent, the second part > is added to the front of the queue and thus will be the first one to > be sent the next time. From a quick glance at the code I do not see > how this could happen. Could you send a unit test reproducing that > problem on the ChannelHandler implementation. > > > > > On the other hand, the following looks like a problem, but it might > > actually work. When TCPMapping receives a SEQ message it immediately > > calls transport.sendBytes(createSEQFrame(channel, ackno, windowSize)) > > rather than creating and queuing up a new Frame object. > > > > When TCPMapping receives a SEQ frame it merely updates the send window > size. Then, it tries to send queued frames. However, the SEQ frame is > only created and sent after receiving ordinary frames (see > frameReceived). Could you please clarify that one? Anyways, sending > the SEQ frame directly might actually be a real problem (see below). > > > I understand how this might be desirable, but it might be better (for > > the sake of interoperability) to defer the sending of this message if > > there is an outstanding split frame. From RFC 3080: > > > > When a message is segmented and sent as several frames, those > frames > > must be sent sequentially, without any intervening frames from > other > > messages on the same channel. > > > > I have a few partially formed thoughts on how to solve this, but I'll > > get back to that tomorrow. I'll test the simple senderWindow check > > once > > I have both fixes together. > > Hm, that's right. I've never thought about the situation where a SEQ > frame is interleaved between two frames of a message. So, a patch that > solves the problem is very welcome. > > > > > > > Let me know if anyone wants the SessionImpl.java patch for 0.1.1. > > > > Cheers, > > Martin > > ------------------------------------------------------------------------------------------------ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any unauthorized use of this email is prohibited. ------------------------------------------------------------------------------------------------ [mf2] |